Section Name Description
SIGARRA Course Info UC info
URL Self-assessment #1
File Introduction to Data Science (in Portuguese)
Assignment File DATASET
File Guideline for diagnostic (in Portuguese)
File Table with BMI information for Portugal (in Portuguese)
File Explanation about the PPA variable (in Portuguese)
File Tables of blood pressure for children (in English)
URL Paper #1 with results on a similar dataset (169 subjects)
URL Paper #2 with statistical results on the same dataset (7199 subjects)
File Data Analysis: some good practices
Homeworks URL To read #1: Apples-to-apples: the pitfals of cross-validation
URL To read #2: The relationship between ROC and Precision-Recall curves
URL To read #3: refutation of the second paper
Python Books URL Python Data Science Handbook
URL Suggestions from python.org
URL O'Reilly Python books
URL Python resources
Theoretical Classes File Presentation
File Introduction to Data Mining
File Data understanding and manipulation
File Distances and dimensionality reduction (till slide 41, inclusive)
URL Recorded class
File Distances and dimensionality reduction (cont. from slide 42)
File Distances and dimensionality reduction (cont. from slide 55)
File Data imputation
File Data visualization
File Basic Concepts in Classification
File Basic Concepts in Classification: Decision Trees (from slide 15)
File Naive Bayes classifier
File Naive Bayes classifier (from slide 9) and Belief Networks
File Evaluating the Performance of a Classifier
File Evaluation Metrics
File Evaluating the Performance of a Classifier (from slide 10)
File Evaluation Metrics (revisited)
File Regression and KNN
File Python code associated with the regression and KNN slides
File Melbourne data associated with regression and KNN slides
File Support Vector Machines (SVM)
File A little more detail on SVMs

Section 2.6.1.4 of this dissertation has a detailed and nice explanation about SVMs.

File Artificial Neural Networls
File Clustering
File Ensembles
File Basic Association Analysis
Practical Classes File Entropy revisited
File Distances revisited
File species.csv
URL Code for decision trees (iris, with pruning)
URL Code for naive Bayes (German credit dataset)
URL Decision boundaries
URL Performance Evaluation of Classifiers
URL Regression and Logistic Regression
URL SVM exercises
URL Hierarchical Clustering

Here it is some Python code applying hierarchical clustering to the iris dataset.

Explore the various options of clustering, including k-means, k-means++ and dbscan. Identify differences between these different clustering methods.

Apply these methods and evaluate the quality of the generated clusters using your favorite dataset.

Folder PPT_to_PDF