Indice

Data Mining A.A. 2021/22

DM1 - Data Mining: Foundations (6 CFU)

Instructors:

Teaching Assistant

DM2 - Data Mining: Advanced Topics and Applications (6 CFU)

Instructors:

Teaching Assistant

News

Learning Goals

Hours and Rooms

DM1

Classes

Day of Week Hour Room
Monday 11:00 - 13:00 Aula C / MS Teams
Thursday 11:00 - 13:00 Aula A1 / MS Teams

Office hours - Ricevimento:

DM 2

Classes

Day of Week Hour Room
Monday 11:00 - 13:00 MS Teams
Thursday 11:00 - 13:00 MS Teams

Office Hours - Ricevimento:

Learning Material -- Materiale didattico

Textbook -- Libro di Testo

Slides

Software

Class Calendar (2021/2022)

First Semester (DM1 - Data Mining: Foundations)

Day Room Topic Learning material Recording Instructor
1. 16.09.2021 11:00-12:45 Aula Fib A1 Introduction. Introducing DM1 Project-work guidelines (updated 22.11.2021) Lecture 1 Pedreschi
2. 20.09.2021 11:00-12:45 Aula Fib C Course overview Overview of contents Lecture 2 Pedreschi
3. 23.09.2021 11:00-12:45 Aula Fib A1 Data Understanding Slides Lecture 3 Pedreschi
4. 27.09.2021 11:00-12:45 Aula Fib C Data Preparation Slides Lecture 4 Pedreschi
5. 30.09.2021 11:00-12:45 Aula Fib A1 Lab: Data Understanding & Preparation – Python Python Introduction Dataset: Iris Hands-On Python (Iris) Lecture 5 Citraro
6. 04.10.2021 11:00-12:45 Aula Fib C Lab: Data Understanding & Preparation – Python (cont.) & KNIME Dataset: Titanic Hands-On Python (Titanic), Titanic DU+DP (complete) KMIME: Intro, KNIME DU+DP Lecture 6 Citraro
7. 07.10.2021 11:00-12:45 Aula Fib A1 Clustering: Intro & K-means Clustering intro and k-means [revised version] Lecture 7 Nanni
11.10.2021 11:00-12:45 Aula Fib C
8. 14.10.2021 11:00-12:45 Aula Fib A1 Clustering: k-means Lecture 8 Nanni
18.10.2021 11:00-12:45 Aula Fib C
9. 21.10.2021 11:00-12:45 Aula Fib A1 Clustering: Hierarchical methods Clustering: Hierarchical Methods Lecture 9 Nanni
10. 25.10.2021 11:00-12:45 Aula Fib C Clustering: density-base methods & exercises Clustering: Density-based methods Lecture 10 Nanni
11. 28.10.2021 11:00-12:45 Aula Fib A1 Lab: Clustering Python Hands-On Clust. (Iris) Python Titanic Knime Lecture 11 Citraro
12. 04.11.2021 11:00-12:45 Aula Fib A1 Classification: intro and decision trees Classification and decision trees (updated 11.11.2021) Lecture 12 Nanni
13. 08.11.2021 11:00-12:45 Aula Fib C Classification: decision trees/2 Lecture 13 Nanni
14. 11.11.2021 11:00-12:45 Aula Fib A1 Classification: decision trees/3 Lecture 14 Nanni
15. 15.11.2021 11:00-12:45 Aula Fib C Classification: decision trees/4 Lecture 15 Nanni
16. 18.11.2021 11:00-12:45 Aula Fib A1 Classification: decision trees exercises Exercise Lecture 16 Nanni
17. 22.11.2021 11:00-12:45 Aula Fib C Lab:Classification knime_classification Hands_on_Python_Titanic Python_Iris Online: TBD Lecture 17 (offline) Citraro
18. 25.11.2021 11:00-12:45 Aula A1 Pattern Mining - 1 Slides Lecture 18 Pedreschi
19. 29.11.2021 11:00-12:45 Aula C Pattern Mining - 2 Lecture 19 Pedreschi
20. 02.12.2021 11:00-12:45 Aula A1 Lab: Pattern Mining Apriori Exercise Hands_on_Python_Titanic KNIME Lecture 20 Citraro

Second Semester (DM2 - Data Mining: Advanced Topics and Applications)

Day Room Teams Topic Learning material Instructor Recordings
01. 14.02.2022 11:00–13:00 C Introduction, CRIPS, Evaluation, KNN Intro, CRISP, Eval, KNN, Notebbok_KNN_Eval Guidotti link
02. 17.02.2022 11:00–13:00 A1 Imbalanced Learning, Evaluation ImbLearn Eval, ImbLearn Guidotti link
03. 21.02.2022 11:00–13:00 C Dimensionality Reduction DimRed, Notebook_DimRed Guidotti link
04. 24.02.2022 11:00–13:00 A1 Outlier Detection (part 1) Outlier Detection, Notebook_OutlierDetection Guidotti link
05. 28.02.2022 11:00–13:00 C Outlier Detection (part 2) Outlier Detection, Notebook_OutlierDetection Guidotti link
06. 03.03.2022 11:00–13:00 A1 Outlier Detection (part 3) Outlier Detection, Notebook_OutlierDetection Guidotti link
07. 07.03.2022 11:00–13:00 C Naive Bayes Classifier, Linear Regression NBC , Notebook_NBC, LinReg Guidotti link
08. 10.03.2022 11:00–13:00 A1 Linear Regression, Gradient Descent, Maximum Likelihood Estimation, Odds LinReg, GradDes, MLE, Odds Guidotti link
09. 14.03.2022 11:00–13:00 C Logistic Regression, Support Vector Machines LogReg, SVM, Notebook_LR, Notebook_SVM Guidotti link
10. 17.03.2022 11:00–13:00 A1 Linear and Logistic Perceptron Perceptron Guidotti link1, link2
11. 21.03.2022 11:00–13:00 C Neural Networks NeuralNetwork, Notebook_NN, Notebook_NN_impl Guidotti link
12. 24.03.2022 11:00–13:00 A1 Ensemble Classifiers, Bagging, Random Forest EnsembleClassifiers, Notebook_ENS Guidotti link
13. 28.03.2022 11:00–13:00 C Boosting, Gradient Boost GBM Guidotti link
14. 31.03.2022 11:00–13:00 A1 XGBoost, LightGBM GBM, Notebook_GBM Guidotti link
15. 04.04.2022 11:00–13:00 C Time Series Introduction, Distance Functions TS_Intro_Distances, Notebook_TS_Sim, Notebook_TS_DTW_Impl, Notebook_TS_DTW_Constr_Impl Guidotti link
16. 07.04.2022 11:00–13:00 A1 Time Series Approximations, Clustering TS_Approx_Clustering, Notebook_TS_ApproxClus Guidotti link
17. 11.04.2022 11:00–13:00 C Time Series Motifs, Discord, Matrix Profile TS_MatrixProfile, TS_MatrixProfile Guidotti link
18. 14.04.2022 11:00–13:00 A1 Time Series Classification TS_Classification Notebook_TSC, Notebook_TSC_SoA Guidotti link
19. 21.04.2022 11:00–13:00 A1 Sequential Pattern Mining SPM Guidotti link
20. 28.04.2022 11:00–13:00 A1 Sequential Pattern Mining SPM, Notebook_SPM Guidotti link
21. 02.05.2022 11:00–13:00 C Advanced Clustering Approaches Advanced_Clustering, Notebook_AC Guidotti link
22. 05.05.2022 11:00–13:00 A1 Transactional Clustering Transactional Clustering, Notebook_TC Guidotti link
23. 09.05.2022 11:00–13:00 C Explainable Artificial Intelligence Explainability, Notebook_XAI Guidotti link
24. 12.05.2022 11:00–13:00 A1 Explainable Artificial Intelligence Explainability, Notebook_XAI Guidotti link

Exams

Exam DM1

The exam is composed of two parts:

Project 1

  1. Assigned: 30/09/2021
  2. MidTerm Deadline: 21/11/2021 (half project required, i.e., Data understanding & Preparation and at least 2 clustering algorithms)
  3. Final Deadline: 14/01/2022 (complete project required)
  4. Data: choose between Glasgow Norms, Seismic Bumps

Project 2

  1. Assigned: After Project 1 Final Deadline
  2. Deadline: one week before the oral exam

Exam DM part II (DMA)

Exam Rules

Exam Booking Periods

Exam Booking Agenda

For online exams the camera must remain open and you must be able to share your screen. For the online exams could be required the usage of the Miro platform (https://miro.com/app/dashboard/).

The exam is composed of two parts:

Project Guidelines

N.B. When “solving the classification task”, remember, (i) to test, when needed, different criteria for the parameter estimation of the algorithms, and (ii) to evaluate the classifiers (e.g., Accuracy, F1, Lift Chart) in order to compare the results obtained with an imbalanced technique against those obtained from using the “original” dataset.

Exam Dates

Exam Sessions

Session Date Time Room Notes Marks
1.11.01.2022 14:00 - 18:00 MS Teams Please, use the system for registration: https://esami.unipi.it/
3.07.06.2022 Please, use the system for registration: https://esami.unipi.it/
4.28.06.2022 Please, use the system for registration: https://esami.unipi.it/
5.19.07.2022 Please, use the system for registration: https://esami.unipi.it/
6.05.09.2022 Please, use the system for registration: https://esami.unipi.it/

Past Exams

Reading About the "Data Scientist" Job

… a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the “sexiest” around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them.

Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.

Previous years