UC info
Data Mining I
Code: | CC4018 | Acronym: | CC4018 | Level: | 400 |
Keywords | |
---|---|
Classification | Keyword |
OFICIAL | Computer Science |
Instance: 2024/2025 - 1S
Active? | Yes |
Responsible unit: | Department of Computer Science |
Course/CS Responsible: | Master in Computer Science |
Cycles of Study/Courses
Acronym | No. of Students | Study Plan | Curricular Years | Credits UCN | Credits ECTS | Contact hours | Total Time |
---|---|---|---|---|---|---|---|
M:CC | 16 | Study plan since 2014/2015 | 1 | - | 6 | 42 | 162 |
M:ERSI | 4 | Official Study Plan since 2021_M:ERSI | 1 | - | 6 | 42 | 162 |
Teaching Staff - Responsibilities
Teacher | Responsibility |
---|---|
Inês de Castro Dutra |
Teaching - Hours
Theoretical and practical : | 3,23 |
Type | Teacher | Classes | Hour |
---|---|---|---|
Theoretical and practical | Totals | 1 | 3,231 |
Inês de Castro Dutra | 3,231 |
Teaching language
Suitable for English-speaking studentsObjectives
This unit has as main objectives to provide an introduction to the main data science methodologies and also to convey knowledge on programming and tools for data processing and analysis, such as the Python language.
Learning outcomes and competences
This unit should provide the students with:
1. theoretical competences on several basic methodologies of data science.
2. competences for developing software for data science tasks.
3. practical competences on applying data science techniques to specific problems.
Working method
PresencialProgram
1. Introduction to Data Science:
• the CRISP-DM model
• data, models and patterns
• data science tasks
2. Data Pre-Processing:
• importing data
• cleaning data
• transforming and creating variables
• dimensionality reduction techniques
3. Exploring and Visualizing Data
• data summarization
• data visualization
4. Descriptive Models
• clustering methods: partitional methods, hierarchical methods
• rule association
5. Predictive Models
• classification and regression tasks
• evaluation metrics
• linear regression, naive Bayes, k-nearest neighbours
• tree-based models: classification and regression trees, pruning methods
• neural networks and deep learning
• support vector machines
• ensembles: bagging, random forests, boosting, AdaBoost, Xgboost
6. Methodologies for Evaluating and Comparing Models
• evaluation measures
• estimation methods
• significance tests
Mandatory literature
Pang-Ning Tan; Introduction to data mining. ISBN: 9780321420527Charu C. Aggarwal; Data mining. ISBN: 978-3-319-14142-8
Jiawei Han; Data mining. ISBN: 978-0-12-381479-1
Complementary Bibliography
Peter Flach; Machine learning. ISBN: 978-1-107-42222-3Andriy Burkov; The Hundred-Page Machine Learning Book, 2019. ISBN: 978-1999579500
Torgo Luís; Data mining with R. ISBN: 978-1-4398-1018-7
Teaching methods and learning activities
The lectures are based on the oral exposition of the topics that are part of the syllabus, as well as illustrations with concrete data mining case studies.
keywords
Technological sciences > Technology > Information technologyPhysical sciences > Computer science > Modelling tools
Physical sciences > Computer science > Informatics > Applied informatics
Technological sciences > Technology > Computer technology > Software technology
Evaluation Type
Distributed evaluation with final examAssessment Components
designation | Weight (%) |
---|---|
Trabalho prático ou de projeto | 20,00 |
Exame | 70,00 |
Apresentação/discussão de um trabalho científico | 10,00 |
Total: | 100,00 |
Amount of time allocated to each course unit
designation | Time (hours) |
---|---|
Elaboração de projeto | 35,00 |
Estudo autónomo | 84,00 |
Apresentação/discussão de um trabalho científico | 1,00 |
Frequência das aulas | 42,00 |
Total: | 162,00 |
Eligibility for exams
Calculation formula of final grade
The assessment of the course is distributed, consisting of a midterm test during the semester, a final exam and a practical assignment at the end of the semester.
The final grade is calculated by the weighted average of the practical and theoretical grades through the formula:
NF = 0.7 * max((T1+T2),Ex) + 0.2 * TP + 0.1 * AP
where:
T1 is the grade for Test 1,
T2 is the grade for Test 2,
Ex is the grade for the Final Exam,
TP is the grade for the Practical Assignment and
AP is the grade for the presentation.
Students who do not obtain a minimum of 30% in each component, i.e. 6 out of 20, will not be approved.
The grades for the tests and assignment may count towards approval. In this case, the final exam (normal or appeal exam) may be used to improve the grade. Those who do not obtain a positive grade with only the tests and assignment will have the opportunity to pass in one of the two exam periods.
Examinations or Special Assignments
The tests will take place during the classes, in the middle and in the end of the semester.
The practical assignment will be announced in the beginning of the semester and should be completed by the end of the semester.
Special assessment (TE, DA, ...)
The assessment for the special period will be carried out in the same way as the continuous assessment, with a final exam. The student will take the exam, which is worth 70%. They may or may not present a project if they wish to complete the remaining 30%. This project needs to be delivered and presented within the calendar for the special season.Classification improvement
The evaluation of the practical assignment is not subject to improvement.
The student can improve in the theoretical grade by taking one of the exams (normal or appeal).
Observations
All of the provided material (e.g. slides, recommended books) is given in English and if there are foreign students the classes will also be given in English.
The material of the discipline will be made available in the corresponding Moodle webpage.