Abschnitt Name Beschreibung
SIGARRA Course Info UC info
Link/URL Self-assessment #1
Datei Introduction to Data Science (in Portuguese)
Assignment Datei DATASET
Datei Guideline for diagnostic (in Portuguese)
Datei Table with BMI information for Portugal (in Portuguese)
Datei Explanation about the PPA variable (in Portuguese)
Datei Tables of blood pressure for children (in English)
Link/URL Paper #1 with results on a similar dataset (169 subjects)
Link/URL Paper #2 with statistical results on the same dataset (7199 subjects)
Datei Data Analysis: some good practices
Homeworks Link/URL To read #1: Apples-to-apples: the pitfals of cross-validation
Link/URL To read #2: The relationship between ROC and Precision-Recall curves
Link/URL To read #3: refutation of the second paper
Python Books Link/URL Python Data Science Handbook
Link/URL Suggestions from python.org
Link/URL O'Reilly Python books
Link/URL Python resources
Theoretical Classes Datei Presentation
Datei Introduction to Data Mining
Datei Data understanding and manipulation
Datei Distances and dimensionality reduction (till slide 41, inclusive)
Link/URL Recorded class
Datei Distances and dimensionality reduction (cont. from slide 42)
Datei Distances and dimensionality reduction (cont. from slide 55)
Datei Data imputation
Datei Data visualization
Datei Basic Concepts in Classification
Datei Basic Concepts in Classification: Decision Trees (from slide 15)
Datei Naive Bayes classifier
Datei Naive Bayes classifier (from slide 9) and Belief Networks
Datei Evaluating the Performance of a Classifier
Datei Evaluation Metrics
Datei Evaluating the Performance of a Classifier (from slide 10)
Datei Evaluation Metrics (revisited)
Datei Regression and KNN
Datei Python code associated with the regression and KNN slides
Datei Melbourne data associated with regression and KNN slides
Datei Support Vector Machines (SVM)
Datei A little more detail on SVMs

Section 2.6.1.4 of this dissertation has a detailed and nice explanation about SVMs.

Datei Artificial Neural Networls
Datei Clustering
Datei Ensembles
Datei Basic Association Analysis
Practical Classes Datei Entropy revisited
Datei Distances revisited
Datei species.csv
Link/URL Code for decision trees (iris, with pruning)
Link/URL Code for naive Bayes (German credit dataset)
Link/URL Decision boundaries
Link/URL Performance Evaluation of Classifiers
Link/URL Regression and Logistic Regression
Link/URL SVM exercises
Link/URL Hierarchical Clustering

Here it is some Python code applying hierarchical clustering to the iris dataset.

Explore the various options of clustering, including k-means, k-means++ and dbscan. Identify differences between these different clustering methods.

Apply these methods and evaluate the quality of the generated clusters using your favorite dataset.

Verzeichnis PPT_to_PDF