Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
Entrambe le parti precedenti la revisione Revisione precedente Prossima revisione | Revisione precedente | ||
dm:start [16/06/2019 alle 16:22 (5 anni fa)] Mirco Nanni [News] |
dm:start [22/04/2024 alle 13:41 (39 ore fa)] (versione attuale) Riccardo Guidotti [Second Semester (DM2 - Data Mining: Advanced Topics and Applications)] |
||
---|---|---|---|
Linea 9: | Linea 9: | ||
ga(' | ga(' | ||
ga(' | ga(' | ||
- | ga(' | + | ga(' |
- | | + | |
ga(' | ga(' | ||
- | ga(' | + | ga(' |
setTimeout(" | setTimeout(" | ||
</ | </ | ||
<!-- End Google Analytics --> | <!-- End Google Analytics --> | ||
+ | <!-- Global site tag (gtag.js) - Google Analytics --> | ||
+ | <script async src=" | ||
+ | < | ||
+ | window.dataLayer = window.dataLayer || []; | ||
+ | function gtag(){dataLayer.push(arguments); | ||
+ | gtag(' | ||
+ | |||
+ | gtag(' | ||
+ | </ | ||
<!-- Capture clicks --> | <!-- Capture clicks --> | ||
< | < | ||
Linea 42: | Linea 50: | ||
</ | </ | ||
</ | </ | ||
- | ====== Data Mining A.A. 2018/19 ====== | + | ====== Data Mining A.A. 2023/24 ====== |
- | ===== DM 1: Foundations of Data Mining (6 CFU) ===== | + | ===== DM1 - Data Mining: Foundations |
- | Instructors | + | Instructors: |
* **Dino Pedreschi** | * **Dino Pedreschi** | ||
- | * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa | + | * KDDLab, Università di Pisa |
* [[http:// | * [[http:// | ||
* [[dino.pedreschi@unipi.it]] | * [[dino.pedreschi@unipi.it]] | ||
- | Teaching assistant - Assistente: | ||
* **Riccardo Guidotti** | * **Riccardo Guidotti** | ||
- | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | + | * KDDLab, Università di Pisa |
- | * [[guidotti@di.unipi.it]] | + | * [[https:// |
- | + | * [[riccardo.guidotti@di.unipi.it]] | |
- | | + | |
- | ===== DM 2: Advanced topics on Data Mining and case studies | + | Teaching Assistant |
+ | * **Andrea Fedele** | ||
+ | * KDDLab, Università di Pisa | ||
+ | * [[https:// | ||
+ | * [[andrea.fedele@phd.unipi.it]] | ||
+ | ===== DM2 - Data Mining: Advanced Topics | ||
Instructors: | Instructors: | ||
- | * **Mirco Nanni, Dino Pedreschi** | ||
- | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | ||
- | * [[http:// | ||
- | * [[mirco.nanni@isti.cnr.it]] | ||
- | * [[dino.pedreschi@unipi.it]] | ||
- | |||
- | ===== DM: Data Mining (9 CFU) ===== | ||
- | |||
- | Instructors: | ||
- | * **Dino Pedreschi, Anna Monreale** | ||
- | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | ||
- | * [[http:// | ||
- | * [[mirco.nanni@isti.cnr.it]] | ||
- | * [[dino.pedreschi@unipi.it]] | ||
- | * [[anna.monreale@unipi.it]] | ||
- | |||
- | Teaching assistant - Assistente: | ||
* **Riccardo Guidotti** | * **Riccardo Guidotti** | ||
- | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | + | * KDDLab, Università di Pisa |
- | * [[guidotti@di.unipi.it]] | + | * [[https:// |
+ | * [[riccardo.guidotti@di.unipi.it]] | ||
- | ====== News ===== | + | Teaching Assistant |
- | * **Results of DM2 2nd mid-term exam are out: {{ : | + | * **Andrea Fedele** |
- | * In preparation for the 2nd mid-term of DM2, you can find the exam of last year and its solutions: {{ : | + | * KDDLab, Università di Pisa |
- | * Due to a strike, the lesson of May 31th 2019 will not take place. In the course calendar you can find some additional material that might be useful to you, especially for the project. \\ There are no other classes in program, therefore the course is now officially closed. Thanks to all my students for attending, and good luck for the exam. | + | * [[https://www.linkedin.com/in/andrea-fedele/? |
- | * The project for DM2 is out! | + | * [[andrea.fedele@phd.unipi.it]] |
- | * Results of DM2 mid-term exam are out: {{ : | + | * Meeting: https://calendly.com/andreafedele/ |
- | * Last exam session on Feb, 14. Please register your name here: https://doodle.com/poll/6dgc5du4fgpnbyyx | + | ====== News ====== |
- | * Results of the written exam of Feb {{ : | + | |
- | * Results of the written exam of January {{: | + | |
- | * Dates for exam registration: | + | |
- | * ** I setup 3 days for the oral exam: 25, 28, 29 January. Other dates will we available after the written exam of Feb. For booking your oral exam please use the doodle indicating you Surname and Name: https:// | + | |
- | * Final results including project evaluation available here: {{ : | + | |
- | * **New project is available!** | + | |
- | * *Results of the {{ : | + | |
- | * Get clusters from scipy dendogram: https://docs.scipy.org/doc/scipy/ | + | |
- | * Help for installing Pyfim library https:// | + | |
- | * *Results of the {{ : | + | |
- | * Students need to decide the group composition for the project and fill this [[https:// | + | |
- | + | * **[19.01.2024]** DM2 Lectures will start on Mon 19/02, only for that lecture the time will be 14-16 instead of 9-11. | |
- | + | * [13.10.2023] To schedule meeting with the Teaching Assistant you can use: https:// | |
- | ====== Learning | + | * [20.09.2023] Recordings of the lectures can be found on the web pages of the course for the years 2020/2021 and 2021/2022 (see links at the bottom of this page) |
+ | * [20.09.2023] Thursday 21 September there will be no lecture. | ||
+ | * [11.09.2023] Lectures will start on Monday 18 September 2023 at 11.00 room C1. | ||
+ | * [11.09.2023] Lectures will be in presence only. Registrations of the lectures of past years can be found at the bottom of this web page. | ||
+ | * [11.09.2023] Project Groups [[https:// | ||
+ | * [11.09.2023] MS Teams [[https:// | ||
+ | ====== Learning | ||
+ | * DM1 | ||
+ | * Fundamental concepts of data knowledge and discovery. | ||
+ | * Data understanding | ||
+ | * Data preparation | ||
+ | * Clustering | ||
+ | * Classification | ||
+ | * Pattern Mining and Association Rules | ||
+ | * Sequential Pattern Mining | ||
- | ** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician | + | |
+ | * Outlier Detection | ||
+ | * Dimensionality Reduction | ||
+ | * Regression | ||
+ | * Advanced Classification | ||
+ | * Time Series Analysis | ||
+ | * Transactional Clustering | ||
+ | * Explainability | ||
- | //Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// | + | ====== Hours and Rooms ====== |
- | + | ||
- | La grande disponibilità di dati provenienti da database relazionali, | + | |
- | - i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati; | + | |
- | - le principali tecniche di datamining (regole associative, | + | |
- | - alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici. | + | |
- | - l’ultima parte del corso ha l’obiettivo di introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza | + | |
- | + | ||
- | ===== Reading about the "data scientist" | + | |
- | + | ||
- | * Data, data everywhere. The Economist, Feb. 2010 {{: | + | |
- | * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http:// | + | |
- | * Welcome to the yotta world. The Economist, Sept. 2011 {{: | + | |
- | * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http:// | + | |
- | * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http:// | + | |
- | * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{: | + | |
- | * Peter Sondergaard, | + | |
- | + | ||
- | * Towards Effective Decision-Making Through Data Visualization: | + | |
- | ====== Hours - Orario e Aule ====== | + | |
- | ===== DM1 & DM ===== | + | ===== DM1 ===== |
- | **Classes | + | **Classes** |
^ Day of Week ^ Hour ^ Room ^ | ^ Day of Week ^ Hour ^ Room ^ | ||
- | | | + | | Monday |
- | | | + | | Wednesday |
- | | Venerdì/ | + | |
**Office hours - Ricevimento: | **Office hours - Ricevimento: | ||
- | * Prof. Pedreschi: Lunedì/Monday | + | * Prof. Pedreschi |
- | * Prof. Monreale: by appointment, | + | * Monday |
- | * Dr. Guidotti: class-appointment (see calendar) | + | * Online |
+ | * Prof. Guidotti | ||
+ | * Tuesday 16:00 - 18:00 or Appointment | ||
+ | * Room 363 Dept. of Computer Science | ||
| | ||
===== DM 2 ===== | ===== DM 2 ===== | ||
- | **Classes | + | **Classes** |
- | ^ Day of week | + | ^ Day of Week |
- | | Thursday | + | | |
- | | Friday | + | | |
- | **Office | + | **Office |
+ | |||
+ | * Tuesday 15.00-17.00 or Appointment by email | ||
+ | * Room 363 Dept. of Computer Science or MS Teams | ||
- | * Nanni : appointment by email, c/o ISTI-CNR | ||
====== Learning Material -- Materiale didattico ====== | ====== Learning Material -- Materiale didattico ====== | ||
Linea 162: | Linea 153: | ||
* Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, | * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, | ||
* [[http:// | * [[http:// | ||
- | * I capitoli | + | * I capitoli |
* Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7 | * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7 | ||
* Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. | * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. | ||
Linea 168: | Linea 159: | ||
- | ===== Slides | + | ===== Slides ===== |
- | * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the the slides provided by the textbook' | + | * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook' |
- | //Le slide utilizzate durante il corso verranno inserite nel calendario al termine di ogni lezione. In buona parte esse sono tratte da quelle fornite dagli autori del libro di testo: [[http:// | + | |
- | ===== Past Exams ===== | + | |
- | * Some text of past exams on **DM1 (6CFU)**: | + | |
+ | ===== Software===== | ||
- | * {{ :dm:2017-1-19.pdf |}}, {{ :dm:2017-9-6.pdf |}}, {{ :dm:2016-05-30-dm1-seconda.pdf |}} | + | * Python - Anaconda (>3.7): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/ |
+ | * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/ | ||
+ | * Pandas: pandas is an open source, BSD-licensed library providing high-performance, | ||
- | * Some solutions of past exams containing exercises on KNN and Naive Bayes classifiers | + | Other softwares for Data Mining |
- | * {{ :dm:dm2_exam.2017.06.13_solutions.pdf |}}, {{ :dm:dm2_exam.2017.07.04_solutions.pdf |}}, {{ :dm: | + | |
+ | * [[http://www.cs.waikato.ac.nz/ | ||
+ | * Didactic Data Mining [[http:// | ||
+ | |||
+ | ====== Class Calendar (2023/2024) ====== | ||
- | * Some exercises | + | ===== First Semester |
- | * {{ : | + | |
+ | ^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer ^ | ||
+ | |01.| 18.09.2023 | 11-13 |C1| Overview, Introduction | {{ : | ||
+ | | | 20.09.2023 | 11-13 | | No Lecture | | | | ||
+ | |02.| 25.09.2023 | 11-13 |C1| Lab. Introduction to Python | {{ : | ||
+ | |03.| 27.09.2023 | 11-13 |C1| Lab. Data Understanding | {{ : | ||
+ | |04.| 02.10.2023 | 11-13 |C1| Data Understanding | {{ : | ||
+ | |05.| 04.10.2023 | 11-13 |C1| Data Understanding & Preparation | {{ : | ||
+ | |06.| 09.10.2023 | 11-13 |C1| Data Preparation & Data Similarity | {{ : | ||
+ | |07.| 11.10.2023 | 11-13 |C1| Data Similarity & Lab. Data Understanding | {{ : | ||
+ | |08.| 16.10.2023 | 11-13 |C1| Introduction to Clustering, K-Means | {{ : | ||
+ | |09.| 18.10.2023 | 11-13 |C1| Clustering Validation, Hierarchical Clustering | {{ : | ||
+ | |10.| 23.10.2023 | 11-13 |C1| Density-based Clustering | {{ : | ||
+ | |11.| 25.10.2023 | 11-13 |C1| Lab. Clustering | {{ : | ||
+ | |12.| 30.10.2023 | 11-13 |C1| Ex. Clustering | {{ : | ||
+ | | | 01.11.2023 | 11-13 | | No Lecture | | | | ||
+ | |13.| 06.11.2023 | 11-13 |C1| Intro Classification, | ||
+ | |14.| 08.11.2023 | 11-13 |C1| Naive Bayes, Exercises | {{ : | ||
+ | |15.| 13.11.2023 | 11-13 |C1| Model Evaluation | {{ : | ||
+ | |16.| 15.11.2023 | 11-13 |C1| Model Evaluation Exercises & Lab | {{ : | ||
+ | | | 20.11.2023 | 11-13 | | No Lecture | | | | ||
+ | |17.| 22.11.2023 | 11-13 |C1| Decision Tree Classifier | {{ : | ||
+ | |18.| 27.11.2023 | 11-13 |C1| Decision Tree Classifier | {{ : | ||
+ | |19.| 29.11.2023 | 11-13 |C1| Exercises and Lab. Decision Tree Classifier | {{ : | ||
+ | |20.| 04.12.2023 | 11-13 |C1| Decision Tree Classifier, Exercises and Lab | {{ : | ||
+ | |21.| 06.12.2023 | 11-13 |C1| Intro Regression & Lab. Regression | {{ : | ||
+ | |22.| 11.12.2023 | 11-13 |C1| Into Pattern Mining and Apriori | {{ : | ||
+ | |23.| 13.12.2023 | 16-18 |C1| Apriori & Lab. Pattern Mining | {{ : | ||
+ | |24.| 18.12.2023 | 11-13 |C| FP-Growth and Exercises | {{ : | ||
+ | ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) ===== | ||
- | * Some very old exercises (part of them with solutions) are available here, most of them in Italian, not all of them on topics covered in this year program: | + | ^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer ^ |
- | | + | |01.| 19.02.2024 | 14-16 |C| Overview, Rule-based Models | {{ : |
- | | + | | | 21.02.2024 | | | No Lecture | | | |
- | * {{:dm:verifica.2008.04.03.pdf|Verifica 3 aprile 2008}} (e {{:dm:soluzioni.2008.04.03.pdf|Soluzioni}}), {{:dm:dm-tdm.appello_2008_07_18_parte1.pdf|Verifica 18 luglio 2008 - parte 1}}, {{:dm:dm-tdm.appello_2008_07_18_parte2.pdf|Verifica 18 luglio 2008 - parte 2}} | + | | | 26.02.2024 | | | No Lecture | | | |
- | * {{:dm:appello.2010.06.01_soluzioni.pdf| Exam with solution 2010-06-01}} {{:dm:appello.2010.06.22_soluzioni.pdf|Exam with solution 2010-06-22}} {{:dm:appello.2010.09.09_soluzioni.pdf|Exam with solution 2010-09-09}}{{:dm:appello.2010.07.13_soluzioni.pdf| Exam with solution 2010-07-13}} | + | |02.| 19.02.2024 | 11-13 |C| Sequential Pattern Mining | {{ :dm: |
+ | |03.| 04.03.2024 | 9-11 |C| Sequential Pattern Mining | {{ : | ||
+ | |04.| 06.03.2024 | 11-13 |C| Transactional Clustering | {{ :dm:17_dm2_transactional_clustering_2023_24.pdf | Transactional Clustering}} | Guidotti| | ||
+ | |05.| 11.03.2024 | 9-11 |C| Time Series Similarity | {{ : | ||
+ | |06.| 13.03.2024 | 11-13 |C| Time Series Approximation | {{ : | ||
+ | |07.| 18.03.2024 | 9-11 |C| Time Series Clustering & Motifs| {{ : | ||
+ | |08.| 20.03.2024 | 11-13 |C| Time Series Classification | {{ : | ||
+ | |09.| 25.03.2024 | 9-11 |C| Imbalanced Learning | {{ : | ||
+ | |10.| 27.03.2024 | 11-13 |C| Dimensionality Reduction | {{ : | ||
+ | |11.| 03.04.2024 | 11-13 |C| Outlier Detection | {{ : | ||
+ | |12.| 08.04.2024 | 9-11 |C| Outlier Detection | {{ : | ||
+ | |13.| 10.04.2024 | 11-13 |C| Outlier Detection | {{ : | ||
+ | |14.| 15.04.2024 | 14-16 |C| Gradient Descend, MLE | {{ : | ||
+ | |15.| 17.04.2024 | 11-13 |C| Odds, LogOdds, Logistic Regression| {{ : | ||
+ | |16.| 22.04.2024 | ||
+ | |17.| 14.04.2024 | 11-13 |C| Perceptron, Neural Networks| {{ : | ||
+ | ====== Exams ====== | ||
- | ===== Data mining software===== | + | ** How and Where: ** |
+ | The exam will take place in oral mode only at the teacher' | ||
+ | The exam will be held online on the 420AA Data Mining course channel only at the request of the | ||
+ | student in accordance with current legislation. | ||
- | | + | ** When: ** |
- | * [[https://www.continuum.io/downloads | Python - Anaconda (2.7 version!!!)]]: | + | The dates relating to the start of the three exams are/will be published on the online platform |
- | * Scikit-learn: | + | https://esami.unipi.it/. Within each session, we will identify dates and slots in order to distribute |
- | * Pandas: pandas | + | various orals. The dates and slots to take the exam will be published on the course page by the end of |
- | * [[http:// | + | May. Each student must also register on https://esami.unipi.it/. The examination can only be carried out after the delivery of the project. The project must be delivered one week before when you want to take the exam. Group oral discussions will be preferred in respect of the project groups in order to parallelize any discussion on the project. It is not mandatory to take the oral exam together |
- | + | In the event that the oral exam is not passed, it will not be possible | |
- | ====== Class calendar | + | ** What: ** |
+ | The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects. | ||
+ | | ||
+ | - Understanding of the algorithms illustrated during the course and their practical implementation. You will be asked to perform one or more simple exercises. The text will be shown on the teacher' | ||
+ | - Discussion of the project with questions from the teacher regarding unclear aspects, | ||
+ | questionable steps or choices. | ||
- | ===== First part of course, first semester (DM1 - Data mining: foundations & DM - Data Mining) ===== | + | ** Final Mark: ** for 12-credit exam, the final mark will be obtained as the |
+ | average mark of DM1 and DM2. | ||
- | ^ ^ Day ^ Aula ^ Topic ^ Learning material ^ Instructor ^ | + | ===== Exam Booking Periods ===== |
- | |1.| 19.09 14:00-16:00 | C1 | Overview. Introduction. | {{ : | + | |
- | |2.| 20.09 16:00-18:00 | C1 | Introduction | + | |
- | | | 21.09 | + | |
- | |3.| 24.09 14:00-16:00 | C1 | KDD Process & Applications. Data Understanding. | + | |
- | |4.| 26.09 14:00-16:00 | C1 | Data Understanding. Data Preparation | + | |
- | |5.| 28.09 11:00-13:00 | C1 | Introduction | + | |
- | |6.| 01.10 14:00-16:00 | C1 | Data Preparation | + | |
- | |7.| 03.10 14:00-16:00 | C1 | Clustering Introduction e Centroid-based clustering | + | |
- | | | 05.10 11:00-13:00 | C1 | Lecture canceled | | | | + | ===== Exam Booking Agenda ===== |
- | |8.| 08.10 14:00-16:00 | C1 | Knime - Python: Data Understanding | + | |
- | |9.| 10.10 14:00-16:00 | C1 | Clustering: K-means & Hierarchical | + | |
- | | | 12.10 11:00-13:00 | C1 | Lecture canceled for IF | | | | + | |
- | |10.| 15.10 14:00-16:00 | C1 | Clustering: DBSCAN | + | |
- | |11.| 17.10 14:00-16:00 | C1 | Clustering: Validity | + | * 5th Appello: - DM1 & DM2: from 19/07/2024 to 24/07/2024 (deliver project by 12/06/2024) |
- | |12.| 19.10 11:00-13:00 | C1 | Discussion on Projects - DU | | Guidotti | | + | |
- | |13.| 22.10 14:00-16:00 | C1 | Exercises for mid-term test | Tool for Dm ex: [[http://matlaspisa.isti.cnr.it:5055/Help|Didactic Data Mining ]] {{ :dm:ex-clustering.pdf | Ex. Clustering PDF}} {{ :dm:ex-clustering.zip |Ex. Clustering PPTX}}| Monreale | | + | |
- | |14.| 24.10 14:00-16:00 | C1 | Knime - Python: Clustering | + | |
- | |15.| 26.10 11:00-13:00 | C1 | Exercises for mid-term test | {{ :dm:clustering-2.zip |Ex. Clustering PPTX - complete }} {{ : | + | |
- | |16.| 05.11 14:00-16:00 | C1 | Classification/1 | {{ : | + | |
- | |17.| | + | |
- | | | + | |
- | |18.| 12.11 14:00-16:00 | C1 | LAB: Classification | + | |
- | |19.| 14.11 14:00-16:00 | C1 | Pattern Mining | + | |
- | |20.| 16.11 11:00-13:00 | C1 | Pattern Mining| | Pedreschi | | + | |
- | |21.| 19.11 14: | + | |
- | |22.| 21.11 14: | + | |
- | | | | | **The next lectures are dedicated to the DM of 9 credits** | | | | + | |
- | |23.| 23.11 11:00-13:00 | C1| Alternative methods for Pattern Mining. Privacy in DM | {{ : | + | |
- | |24.| 26.11 14: | + | |
- | |25.| 28.11 14: | + | |
- | |26.| 30.11 11: | + | |
- | |27.| 03.12 14: | + | |
- | |28.| 05.12 14: | + | |
- | |29.| 07.12 11: | + | |
- | |30.| 10.12 14: | + | |
- | |31.| 12.12 14: | + | |
- | |32.| 14.12 11:00-13:00 | C1| Cancelled | | | | + | |
+ | **Do not forget to make the evaluation of the course!!!** | ||
+ | ===== Exam DM1 ====== | ||
- | ===== Second part of course, second semester (DMA - Data mining: advanced topics and case studies) ===== | + | The exam is composed |
- | ^ ^ Day ^ Room (Aula) ^ Topic ^ Learning material ^ Instructor (default: Nanni)^ | + | * An **oral exam**, |
- | |1.| 21.02.2019 14:00-16:00 | A1 | Introduction + Sequential patters/1 | {{ : | + | |
- | |2.| 22.02.2019 16:00-18:00 | C1 | Sequential patterns/ | + | |
- | |3.| 01.03.2019 16:00-18:00 | C1 | Sequential patterns/3 | {{ : | + | |
- | |4.| 07.03.2019 14:00-16:00 | A1 | Sequential patterns/4 | Sequential pattern tools: Link to [[http:// | + | |
- | |5.| 08.03.2019 16:00-18:00 | C1 | Time series/ | + | |
- | |6.| 14.03.2019 14:00-16:00 | A1 | Time series/2 | [[https:// | + | |
- | |7.| 15.03.2019 16:00-18:00 | C1 | Time series/3 | | | | + | |
- | |8.| 21.03.2019 14:00-16:00 | A1 | Time series/4 | {{ : | + | |
- | |9.| 22.03.2019 16:00-18:00 | C1 | Time series/5 | | | | + | |
- | |10.| 28.03.2019 14:00-16:00 | A1 | Exercises for mid-term exam | {{ : | + | |
- | |11.| 29.03.2019 16:00-18:00 | C1 | Exercises for mid-term exam | {{ : | + | |
- | | | + | |
- | |11.| 11.04.2019 14:00-16:00 | A1 | Classification: | + | |
- | |12.| 12.04.2019 16:00-18:00 | C1 | Classification: | + | |
- | | | < | + | |
- | |13.| 03.05.2019 16:00-18:00 | C1 | Classification: | + | |
- | |14.| 09.05.2019 14:00-16:00 | A1 | Classification: | + | |
- | |15.| 10.05.2019 16:00-18:00 | C1 | Classification: | + | |
- | |16.| 16.05.2019 14:00-16:00 | A1 | Classification: | + | |
- | |17.| 17.05.2019 16:00-18:00 | C1 | Classification: | + | |
- | |18.| 23.05.2019 14:00-16:00 | A1 | Exercises + Outlier detection/1 | {{ : | + | |
- | |19.| 24.05.2019 16:00-18:00 | C1 | Outlier detection/2 | {{ : | + | |
- | |< | + | |
- | | | 06.06.2019 16:00-18:00 | E (+A1) | **mid-term exam** | {{ : | + | |
- | ====== Exams ====== | + | |
- | ===== Exam DM part I (DMF) ====== | + | * A **project**, |
+ | |||
+ | * **Dataset** | ||
+ | - Assigned: 25/ | ||
+ | - MidTerm Submission: 15/11/2023 (+0.5) (half project required, i.e., Data Understanding & Preparation and Clustering) | ||
+ | - Final Submission: 31/12/2023 (+0.5) one week before the oral exam (complete project required). | ||
+ | - Dataset: {{ :dm:std.zip | STD}} | ||
- | The exam is composed of three parts: | + | ** DM1 Project Guidelines ** |
+ | See {{ :dm: | ||
- | * A **written exam**, with exercises and questions about methods and algorithms presented during the classes. It can be substitute with the first and second mid-term tests of November and December. | ||
- | * An **oral exam (optional) **, that includes: (1) discussing the project report with a group presentation; | ||
- | * A **project** consists in exercises that require the use of data mining tools for analysis of data. Exercises include: data understanding, | ||
- | Tasks of the project: | ||
- | - ** Data Understanding (Collective discussion on: 19/ | ||
- | - ** Clustering analysis (Collective discussion on: 21/ | ||
- | - ** Classification (Collective discussion on: 12/ | ||
- | - ** Association Rules (Collective discussion on: 12/ | ||
+ | |||
+ | ===== Exam DM2 ====== | ||
- | * Project 1 | + | The exam is composed of two parts: |
- | - Dataset: **Credit Card Default** | + | |
- | - Assigned: 01/ | + | |
- | - Deadline: < | + | |
- | - Link: https:// | + | |
+ | * An **oral exam**, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises. | ||
- | * Project | + | * A **project**, |
- | - Dataset: | + | |
- | - Assigned: | + | |
- | - Deadline: 31/05/2019 | + | - Assigned: |
- | - Link: https://www.kaggle.com/blastchar/telco-customer-churn | + | - MidTerm Submission: 30/04/2024 (Modules 1 and 2 (for TS classification non DL-based models) |
+ | - Final Submission: one week before the oral exam (complete project required, also with DL-based models for TS classification). | ||
+ | - Dataset: [[https://unipiit-my.sharepoint.com/:u:/g/ | ||
+ | ** DM2 Project Guidelines ** | ||
+ | See {{ : | ||
- | | ||
- | |||
- | ===== Exam DM part II (DMA) ====== | ||
- | The exam is composed of three parts: | ||
- | * A **written exam**, with exercises and questions about methods and algorithms presented during the classes. It can be substitute with the first and second mid-term tests of April and June. | ||
- | *< | ||
- | | + | ===== Past Exams ===== |
+ | | ||
- | * A **project**, | + | ===== Reading About the "Data Scientist" |
- | * **Dataset**: | + | |
- | * **Task 1: Time series**: Consider only attribute | + | |
- | * **Task 2: Sequential patterns**: discover contiguous sequential patterns of at least length 4. Before that, time series should be discretized in some way. | + | |
- | * **Task 3: | + | |
- | * **Task 4: Outlier detection**: | + | |
- | ====== Appelli di esame ====== | + | ** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/ |
- | ===== Mid-term exams ===== | + | //Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// |
- | ^ ^ Date ^ Hour ^ Place ^ Notes ^ Marks ^ | + | * Data, data everywhere. The Economist, Feb. 2010 {{:dm:economist--010.pdf|download}} |
- | | DM1: First Mid-term 2018 | 30.10.2018 | 11-13 | Room C1, L1, N1 | Please, use the system for registration: | + | * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/ |
- | | DM1: Second Mid-term 2018 | 18.12.2018| 11-13 | Room C1, L1, N1 | Please, use the system for registration: | + | * Welcome to the yotta world. The Economist, Sept. 2011 {{:dm:economist-2012-dm.pdf|download}} |
- | | DM2: First Mid-term 2019 | 04.04.2019 | 16-18 | Room A1, E | Please, use the system for registration: | + | * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http:// |
- | | DM2: Second Mid-term 2019 | 06.06.2019 | 16-18 | Room E \\ (+ A1 if needed) | Please, use the system for registration: | + | * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http://www.ilsole24ore.com/art/ |
- | ===== Appelli regolari / Exam sessions ===== | + | * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics |
- | ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ | + | * Peter Sondergaard, |
- | |1.|16.01.2019| 14:00 - 18:00| Room E | | | | + | * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/ |
- | |2.|06.02.2019| 14:00 - 18:00| Room E | | | | + | |
- | |3.|19.06.2019| 09:00 - 13:00| Room A1 | Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September.| | | + | |
- | |4.|10.07.2019| 09:00 - 13:00| Room A1 |Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. | | | + | |
- | ===== Appelli straordinari A.A. 2017/18 / Extra sessions A.A. 20167/ | + | |
- | + | ||
- | ^ Date ^ Time ^ Room ^ Notes ^ Results ^ | + | |
====== Previous years ===== | ====== Previous years ===== | ||
+ | * [[dm.2022-23ds]] | ||
+ | * [[dm.2021-22ds]] | ||
+ | * [[dm.2020-21]] | ||
+ | * [[dm.2019-20]] | ||
+ | * [[dm.2018-19]] | ||
* [[dm.2017-18]] | * [[dm.2017-18]] | ||
* [[dm.2016-17]] | * [[dm.2016-17]] | ||
Linea 349: | Linea 345: | ||
* [[dm.2012-13]] | * [[dm.2012-13]] | ||
* [[dm.2011-12]] | * [[dm.2011-12]] | ||
- | * [[dm.2010-11]] | + | |
- | * [[dm.2009-10]] | + | |
- | * [[dm.2008-09]] | + | |
- | * [[dm.2007-08]] | + | |
- | * [[dm.2006-07]] | + | |
- | * [[PhDWorkshop2011]] | + | |
- | * [[SNA.Ingegneria2011]] | + | |
- | * [[SNA.IMT.2011]] | + | |
- | * [[MAINS.SANTANNA.2011-12]] | + | |
- | * [[MAINS.SANTANNA.DM4CRM.2012]] | + | |
- | * [[MAINS.SANTANNA.DM4CRM.2016]] | + | |
- | * [[MAINS.SANTANNA.DM4CRM.2017 | Data Mining for Customer Relationship Management 2017]] | + | |
- | * [[MAINS.SANTANNA.DM4CRM.2018]] | + | |
- | * [[MAINS.SANTANNA.DM4CRM.2019]] | + | |
- | * [[SDM2018 | Instructions for camera ready and copyright transfer]] | + | |
- | * [[DM-SAM | Storie dell' | + |