dm:start
Differenze
Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.
| Entrambe le parti precedenti la revisioneRevisione precedenteProssima revisione | Revisione precedente | ||
| dm:start [29/04/2019 alle 09:50 (7 anni fa)] – [Second part of course, second semester (DMA - Data mining: advanced topics and case studies)] Mirco Nanni | dm:start [03/11/2025 alle 17:58 (43 ore fa)] (versione attuale) – [First Semester (DM1 - Data Mining: Foundations)] Fosca Giannotti | ||
|---|---|---|---|
| Linea 1: | Linea 1: | ||
| - | < | + | ====== Data Mining A.A. 2025/26 ====== |
| - | <!-- Google Analytics --> | + | |
| - | <script type=" | + | |
| - | (function(i, | + | |
| - | (i[r].q=i[r].q||[]).push(arguments)}, | + | |
| - | m=s.getElementsByTagName(o)[0]; | + | |
| - | })(window, | + | |
| - | ga(' | + | ===== DM1 - Data Mining: Foundations |
| - | ga(' | + | |
| - | ga(' | + | |
| - | + | ||
| - | ga(' | + | |
| - | ga(' | + | |
| - | setTimeout(" | + | |
| - | </ | + | |
| - | <!-- End Google Analytics --> | + | |
| - | <!-- Capture clicks --> | + | |
| - | < | + | |
| - | jQuery(document).ready(function(){ | + | |
| - | jQuery(' | + | |
| - | var fname = this.href.split('/' | + | |
| - | ga(' | + | |
| - | }); | + | |
| - | jQuery(' | + | |
| - | var fname = this.href.split('/' | + | |
| - | ga(' | + | |
| - | }); | + | |
| - | jQuery(' | + | |
| - | var fname = this.href.split('/' | + | |
| - | ga(' | + | |
| - | }); | + | |
| - | jQuery(' | + | |
| - | var fname = this.href.split('/' | + | |
| - | ga(' | + | |
| - | }); | + | |
| - | jQuery(' | + | |
| - | var fname = this.href.split('/' | + | |
| - | ga(' | + | |
| - | }); | + | |
| - | }); | + | |
| - | </ | + | |
| - | </ | + | |
| - | ====== Data Mining A.A. 2018/19 ====== | + | |
| - | ===== DM 1: Foundations of Data Mining (6 CFU) ===== | + | Instructors: |
| - | + | ||
| - | Instructors | + | |
| * **Dino Pedreschi** | * **Dino Pedreschi** | ||
| - | * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa | + | * KDDLab, Università di Pisa |
| * [[http:// | * [[http:// | ||
| * [[dino.pedreschi@unipi.it]] | * [[dino.pedreschi@unipi.it]] | ||
| - | Teaching assistant - Assistente: | ||
| * **Riccardo Guidotti** | * **Riccardo Guidotti** | ||
| - | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | + | * KDDLab, Università di Pisa |
| - | * [[guidotti@di.unipi.it]] | + | * [[https:// |
| - | + | * [[riccardo.guidotti@di.unipi.it]] | |
| - | | + | |
| - | ===== DM 2: Advanced topics on Data Mining and case studies (6 CFU) ===== | + | Teaching Assistant |
| + | * **Alessio Cascione** | ||
| + | * KDDLab, Università di Pisa | ||
| + | * [[https:// | ||
| + | * [[alessio.cascione@phd.unipi.it]] | ||
| - | Instructors: | + | ===== DM2 - Data Mining: Advanced Topics and Applications |
| - | * **Mirco Nanni, Dino Pedreschi** | + | |
| - | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | + | |
| - | * [[http:// | + | |
| - | * [[mirco.nanni@isti.cnr.it]] | + | |
| - | * [[dino.pedreschi@unipi.it]] | + | |
| - | + | ||
| - | ===== DM: Data Mining (9 CFU) ===== | + | |
| Instructors: | Instructors: | ||
| - | * **Dino Pedreschi, Anna Monreale** | ||
| - | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | ||
| - | * [[http:// | ||
| - | * [[mirco.nanni@isti.cnr.it]] | ||
| - | * [[dino.pedreschi@unipi.it]] | ||
| - | * [[anna.monreale@unipi.it]] | ||
| - | |||
| - | Teaching assistant - Assistente: | ||
| * **Riccardo Guidotti** | * **Riccardo Guidotti** | ||
| - | * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa | + | * KDDLab, Università di Pisa |
| - | * [[guidotti@di.unipi.it]] | + | * [[https:// |
| + | * [[riccardo.guidotti@di.unipi.it]] | ||
| - | ====== News ===== | + | Teaching Assistant |
| - | * ** Last exam session on Feb, 14. Please register your name here: https:// | + | * **Alessio Cascione** |
| - | * Results of the written exam of Feb {{ : | + | * KDDLab, Università di Pisa |
| - | * Results of the written exam of January {{: | + | * [[https://www.linkedin.com/in/alessio-cascione-a77224159/? |
| - | * Dates for exam registration: | + | * [[alessio.cascione@phd.unipi.it]] |
| - | * ** I setup 3 days for the oral exam: 25, 28, 29 January. Other dates will we available after the written exam of Feb. For booking your oral exam please use the doodle indicating you Surname and Name: https://doodle.com/poll/3wunys9yd8s9q8ay ** | + | |
| - | * Final results including project evaluation available here: {{ : | + | |
| - | * **New project is available!** | + | |
| - | * *Results of the {{ : | + | |
| - | * Get clusters from scipy dendogram: https:// | + | |
| - | * Help for installing Pyfim library https:// | + | |
| - | * *Results of the {{ : | + | |
| - | * Students need to decide the group composition for the project and fill this [[https:// | + | |
| - | + | ====== | |
| - | + | * [07.10.2025] The lecture of Thursday 10/10/2025 is canceled due to the UniPi Orienta event. The recovery lecture is Tuesday 14/10/2025 9-11 room M1. | |
| - | ====== | + | * [06.10.2025] Link to Project Groups Registration DM1 [25/26] (max 3 students for each group - access with your University of Pisa account, deadline 17/ |
| + | * [28.07.2025] Lectures will start on Monday 29 September 2025 at 09.00 room E. Lectures will be in presence only. Registrations of the lectures of past years can be found at the bottom of this web page. | ||
| + | |||
| - | ** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician | + | ====== Learning Goals ====== |
| + | | ||
| + | * Fundamental concepts | ||
| + | | ||
| + | * Data preparation | ||
| + | * Clustering | ||
| + | * Classification | ||
| + | * Pattern Mining and Association Rules | ||
| + | * Sequential Pattern Mining | ||
| - | //Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// | + | * DM2 |
| + | * Outlier Detection | ||
| + | * Dimensionality Reduction | ||
| + | * Regression | ||
| + | * Advanced Classification and Regression | ||
| + | * Time Series Analysis | ||
| + | * Transactional Clustering | ||
| + | * Explainability | ||
| - | La grande disponibilità di dati provenienti da database relazionali, | + | ====== Hours and Rooms ====== |
| - | - i concetti di base del processo di estrazione della conoscenza: studio e preparazione dei dati, forme dei dati, misure e similarità dei dati; | + | |
| - | - le principali tecniche di datamining (regole associative, | + | |
| - | - alcuni casi di studio nell’ambito del marketing e del supporto alla gestione clienti, del rilevamento di frodi e di studi epidemiologici. | + | |
| - | - l’ultima parte del corso ha l’obiettivo di introdurre gli aspetti di privacy ed etici inerenti all’utilizzo di tecniche inferenza sui dati e dei quali l’analista deve essere a conoscenza | + | |
| - | ===== Reading about the "data scientist" | + | ===== DM1 ===== |
| - | | + | **Classes** |
| - | | + | |
| - | | + | |
| - | | + | |
| - | * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http:// | + | |
| - | * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{: | + | |
| - | * Peter Sondergaard, | + | |
| - | * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http:// | + | ^ Day of Week ^ Hour ^ Room ^ |
| - | ====== Hours - Orario e Aule ====== | + | | Monday |
| + | | Thursday | ||
| - | ===== DM1 & DM ===== | + | **Office hours - Ricevimento: |
| - | **Classes | + | |
| + | | ||
| + | | ||
| - | ^ Day of Week ^ Hour ^ Room ^ | + | * Prof. Guidotti |
| - | | Lunedì/ | + | * Thursday |
| - | | Mercoledì/ | + | * Room 363 Dept. of Computer Science or MS Teams |
| - | | Venerdì/ | + | |
| - | **Office hours - Ricevimento: | ||
| - | * Prof. Pedreschi: Lunedì/ | + | * Alessio Cascione |
| - | * Prof. Monreale: by appointment, | + | * Google Meet slot - https://calendly.com/ |
| - | * Dr. Guidotti: class-appointment | + | * Alternative |
| | | ||
| ===== DM 2 ===== | ===== DM 2 ===== | ||
| - | **Classes | + | **Classes** |
| - | ^ Day of week | + | ^ Day of Week |
| - | | Thursday | + | | |
| - | | Friday | + | | |
| - | **Office | + | **Office |
| + | |||
| + | * Tuesday 15.00-17.00 or Appointment by email | ||
| + | * Room 363 Dept. of Computer Science or MS Teams | ||
| - | * Nanni : appointment by email, c/o ISTI-CNR | ||
| ====== Learning Material -- Materiale didattico ====== | ====== Learning Material -- Materiale didattico ====== | ||
| Linea 157: | Linea 105: | ||
| * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, | * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, | ||
| * [[http:// | * [[http:// | ||
| - | * I capitoli | + | * I capitoli |
| * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7 | * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7 | ||
| * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. | * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. | ||
| Linea 163: | Linea 111: | ||
| - | ===== Slides | + | ===== Slides ===== |
| - | * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the the slides provided by the textbook' | + | * The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook' |
| - | //Le slide utilizzate durante il corso verranno inserite nel calendario al termine di ogni lezione. In buona parte esse sono tratte da quelle fornite dagli autori del libro di testo: [[http:// | + | |
| - | ===== Past Exams ===== | + | |
| + | ===== FAQ ===== | ||
| - | * Some text of past exams on **DM1 (6CFU)**: | + | For the academic year 2025/2026, we make available a document containing |
| + | Please consult this document first, as your question may already be answered there. | ||
| + | The FAQ will be updated regularly after each lecture with new relevant questions from students. | ||
| - | * {{ :dm:2017-1-19.pdf |}}, {{ : | + | Check the document: |
| + | https://docs.google.com/ | ||
| + | ===== Software===== | ||
| - | * Some solutions of past exams containing exercises on KNN and Naive Bayes classifiers | + | |
| - | * {{ :dm: | + | * Scikit-learn: python library with tools for data mining and data analysis [[http:// |
| + | * Pandas: pandas is an open source, BSD-licensed library providing high-performance, | ||
| - | * Some exercises (partially with solutions) on **sequential patterns** and **time series** can be found in the following texts of exams from the last years: | + | Other softwares for Data Mining |
| - | * {{ :dm:dm2_exam.2015.04.13.results.pdf|}}, {{ :dm:dm2_exam.2016.04.4_sol.pdf |}}, {{ :dm:dm2_exam.2016.04.5_sol.pdf |}}, {{ :dm:dm2_exam.2016.06.20_sol.pdf | + | * [[http://www.knime.org | KNIME ]] The Konstanz Information Miner. [[http:// |
| + | * [[http://www.cs.waikato.ac.nz/ | ||
| + | * Didactic Data Mining [[http:// | ||
| + | |||
| + | ====== Class Calendar (2025/2026) ====== | ||
| + | ===== First Semester (DM1 - Data Mining: Foundations) ===== | ||
| - | | + | ^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer ^ |
| - | * {{tdm: | + | | |
| - | * {{dm: | + | | | 18.09.2025 |
| - | * {{:dm:verifica.2008.04.03.pdf|Verifica 3 aprile 2008}} (e {{:dm:soluzioni.2008.04.03.pdf|Soluzioni}}), {{:dm:dm-tdm.appello_2008_07_18_parte1.pdf|Verifica 18 luglio 2008 - parte 1}}, {{:dm:dm-tdm.appello_2008_07_18_parte2.pdf|Verifica 18 luglio 2008 - parte 2}} | + | | | 22.09.2025 | | | No Lecture | | | |
| - | | + | | | 25.09.2025 | | | No Lecture | | | |
| + | |01.| 29.09.2025 | 09-11 | E | Overview, Introduction | {{ :dm: | ||
| + | |02.| 02.10.2025 | 09-11 | E | The KDD process | {{ :dm: | ||
| + | |03.| 06.10.2025 | 09-11 | E | Introduction to Python | ||
| + | | | 09.10.2025 | | | No Lecture (UNIPI Orienta) | | | | ||
| + | |04.| 13.10.2025 | 09-11 | E | Data Understanding | {{ :dm:01_dm1_data_understanding_2025_26.pdf | Data Understanding | ||
| + | |05.| 14.10.2025 | 09-11 | C1 | Data Preparation | {{ :dm:02_dm1_data_preparation_2025_26.pdf | Data Preparation}}, {{ :dm:03_dm1_data_similarity_2025_26.pdf | Data Similarity}} | Guidotti | | ||
| + | |04.| 16.10.2025 | 09-11 | E | Data Understanding Lab| {{ :dm:16.10.25_data_understanding_2025_lecture_in_class.zip |}} | Guidotti, Cascione | | ||
| + | |06.| 20.10.2025 | 09-11 | E | Data Similarity and Introduction to Clustering | {{ :dm:03_dm1_data_similarity_2025_26.pdf | Data Similarity}}, {{ :dm:04_dm1_clustering_intro_2025_26.pdf | Introduction to Clustering}} | Guidotti | | ||
| + | |07.| 23.10.2025 | 09-11 | E | Centroid-based Clustering Algorithm | {{ :dm:05_dm1_kmeans_2025_26.pdf | Centroid-based Clustering}} | Guidotti | | ||
| + | |08.| 27.10.2025 | 09-11 | E | Hierarchical Clustering Algorithm | {{ :dm:06_dm1_hierarchical_clustering_2025_26.pdf | Hierarchical Clustering}} | Guidotti | | ||
| + | |09.| 27.10.2025 | 09-11 | E | Density-based Clustering Algorithm | {{ :dm:07_dm1_density_based_2025_26.pdf | Density-based Clustering}} | Guidotti | | ||
| + | |10.|03.11.2025 | 09-11 | E | Clustering Lab | {{ :dm:03.11.25_clustering_2025_lecture_in_class.zip |}} | Pedreschi, Cascione | | ||
| + | ===== Second Semester (DM2 - Data Mining: Advanced Topics and Applications) ===== | ||
| - | ===== Data mining software===== | + | ^ ^ Day ^ Time ^ Room ^ Topic ^ Material ^ Lecturer ^ |
| + | |01.| 18.02.2025 | 14-16 |A1| Overview, Imbalanced Learning | {{ : | ||
| - | * [[http:// | + | ====== Exams ====== |
| - | * [[https:// | + | |
| - | * Scikit-learn: | + | |
| - | * Pandas: pandas is an open source, BSD-licensed library providing high-performance, | + | |
| - | * [[http:// | + | |
| - | + | ||
| - | ====== Class calendar - Calendario delle lezioni (2018/2019) ====== | + | ** How and Where: ** |
| + | The exam will take place in oral mode only at the teacher' | ||
| + | The exam will be held online on the 420AA Data Mining course channel only at the request of the | ||
| + | student in accordance with current legislation. | ||
| - | ===== First part of course, first semester (DM1 - Data mining: foundations & DM - Data Mining) ===== | + | ** When: ** |
| + | The dates relating to the start of the three exams are/will be published on the online platform | ||
| + | https:// | ||
| + | various orals. The dates and slots to take the exam will be published on the course page by the end of | ||
| + | May. Each student must also register on https:// | ||
| + | In the event that the oral exam is not passed, it will not be possible to take it for 20 days. If the project is not considered sufficient, it must be carried out again on a new dataset or a very updated version of the current one. | ||
| - | ^ ^ Day ^ Aula ^ Topic ^ Learning material ^ Instructor ^ | + | ** What: ** |
| - | |1.| 19.09 14:00-16:00 | C1 | Overview. Introduction. | + | The oral test will evaluate the practical understanding of the algorithms. The exam will evaluate three aspects. |
| - | |2.| 20.09 16:00-18:00 | C1 | Introduction | + | - Understanding |
| - | | | 21.09 11:00-13:00 | C1 | Lecture canceled | + | - Understanding |
| - | |3.| 24.09 14:00-16:00 | C1 | KDD Process & Applications. Data Understanding. | + | - Discussion |
| - | |4.| 26.09 14:00-16:00 | C1 | Data Understanding. | + | |
| - | |5.| 28.09 11:00-13:00 | C1 | Introduction | + | |
| - | |6.| 01.10 14:00-16:00 | C1 | Data Preparation | + | |
| - | |7.| 03.10 14:00-16:00 | C1 | Clustering Introduction e Centroid-based clustering | + | |
| - | | | 05.10 11:00-13:00 | C1 | Lecture canceled | | | | + | |
| - | |8.| 08.10 14:00-16:00 | C1 | Knime - Python: Data Understanding | + | |
| - | |9.| 10.10 14:00-16:00 | C1 | Clustering: K-means & Hierarchical | + | |
| - | | | 12.10 11:00-13:00 | C1 | Lecture canceled for IF | | | | + | |
| - | |10.| 15.10 14:00-16:00 | C1 | Clustering: DBSCAN | + | |
| - | |11.| 17.10 14:00-16:00 | C1 | Clustering: Validity | + | |
| - | |12.| 19.10 11:00-13:00 | C1 | Discussion | + | |
| - | |13.| 22.10 14:00-16:00 | C1 | Exercises for mid-term test | Tool for Dm ex: [[http:// | + | |
| - | |14.| 24.10 14:00-16:00 | C1 | Knime - Python: Clustering | + | |
| - | |15.| 26.10 11:00-13:00 | C1 | Exercises for mid-term test | {{ : | + | |
| - | |16.| 05.11 14:00-16:00 | C1 | Classification/ | + | |
| - | |17.| 07.11 14:00-16:00 | C1 | Classification/ | + | |
| - | | | + | |
| - | |18.| 12.11 14:00-16:00 | C1 | LAB: Classification | + | |
| - | |19.| 14.11 14:00-16:00 | C1 | Pattern Mining | + | |
| - | |20.| 16.11 11:00-13:00 | C1 | Pattern Mining| | Pedreschi | | + | |
| - | |21.| 19.11 14: | + | |
| - | |22.| 21.11 14: | + | |
| - | | | | | **The next lectures are dedicated | + | |
| - | |23.| 23.11 11: | + | |
| - | |24.| 26.11 14: | + | |
| - | |25.| 28.11 14: | + | |
| - | |26.| 30.11 11: | + | |
| - | |27.| 03.12 14: | + | |
| - | |28.| 05.12 14: | + | |
| - | |29.| 07.12 11: | + | |
| - | |30.| 10.12 14: | + | |
| - | |31.| 12.12 14: | + | |
| - | |32.| 14.12 11: | + | |
| + | ** Final Mark: ** for 12-credit exam, the final mark will be obtained as the | ||
| + | average mark of DM1 and DM2. | ||
| - | ===== Second part of course, second semester (DMA - Data mining: advanced topics | + | *** Exams Registration Instructions for DM1*** |
| + | - Use the Google registration form: TBD if you cannot register on Esami on Data Mining for year 2025/2026. | ||
| + | - When the registration closes you will receive a link to the Agenda | ||
| + | - Register on the Agenda selecting day and time (do not change you choice or cancel, if you book you want to do the exam) | ||
| + | - Submit the project at least 1 week before the day you selected (or within 31/12 to get +0.5 extra mark) | ||
| - | ^ ^ Day ^ Room (Aula) ^ Topic ^ Learning material ^ Instructor (default: Nanni)^ | + | ===== Exam Booking Periods ===== |
| - | |1.| 21.02.2019 14:00-16:00 | A1 | Introduction + Sequential patters/1 | {{ : | + | |
| - | |2.| 22.02.2019 16:00-18:00 | C1 | Sequential patterns/ | + | |
| - | |3.| 01.03.2019 16:00-18:00 | C1 | Sequential patterns/3 | {{ : | + | |
| - | |4.| 07.03.2019 14:00-16:00 | A1 | Sequential patterns/4 | Sequential pattern tools: Link to [[http://www.philippe-fournier-viger.com/spmf/|SPMF]] + {{ : | + | |
| - | |5.| 08.03.2019 16:00-18:00 | C1 | Time series/ | + | * 3rd Appello: from TBD to TBD |
| - | |6.| 14.03.2019 14:00-16:00 | A1 | Time series/2 | [[https:// | + | |
| - | |7.| 15.03.2019 16:00-18:00 | C1 | Time series/3 | | | | + | |
| - | |8.| 21.03.2019 14:00-16:00 | A1 | Time series/4 | {{ : | + | |
| - | |9.| 22.03.2019 16:00-18:00 | C1 | Time series/5 | | | | + | |
| - | |10.| 28.03.2019 14:00-16:00 | A1 | Exercises for mid-term exam | {{ : | + | |
| - | |11.| 29.03.2019 16:00-18:00 | C1 | Exercises for mid-term exam | {{ : | + | |
| - | | | 04.04.2019 16:00-18:00 | A1 + E | **mid-term exam** | | | | + | |
| - | |11.| 11.04.2019 14:00-16:00 | A1 | Classification: | + | |
| - | |12.| 12.04.2019 16:00-18:00 | C1 | Classification: | + | |
| - | | | < | + | |
| - | |13.| 03.05.2019 16:00-18:00 | C1 | Classification: | + | |
| - | ====== Exams ====== | + | |
| - | ===== Exam DM part I (DMF) ====== | + | ===== Exam DM1 ====== |
| - | The exam is composed of three parts: | + | The exam is composed of two parts: |
| - | * A **written | + | * An **oral exam**, |
| - | * An **oral exam (optional) | + | * A **project**, that consists in exercises |
| - | + | ||
| - | * A **project** | + | |
| - | Tasks of the project: | + | - Assigned: 15/10/2025 |
| - | | + | - MidTerm Submission: 15/11/2025 (+0.5) (half project required, i.e., Data Understanding & Preparation |
| - | - ** Clustering analysis (Collective discussion on: 21/11/2018): ** Explore the dataset using various clustering techniques. Carefully describe your's decisions for each algorithm | + | - Final Submission: 31/12/2025 (+0.5) one week before |
| - | - ** Classification (Collective discussion on: 12/12/2018): ** Explore | + | - Dataset: Download here {{ :dm: |
| - | - ** Association Rules (Collective discussion on: 12/12/2018): ** Explore the dataset using frequent pattern mining and association rules extraction. Then use them to predict a variable either for replacing missing values or to predict target variable. (see Guidelines for details) | + | |
| + | ** DM1 Project Guidelines ** | ||
| + | See {{ : | ||
| - | * Project 1 | ||
| - | - Dataset: **Credit Card Default** | ||
| - | - Assigned: 01/10/2018 | ||
| - | - Deadline: < | ||
| - | - Link: https:// | ||
| + | ===== Exam DM2 ====== | ||
| - | * Project 2 | + | The exam is composed of two parts: |
| - | - Dataset: **Telco Customer Churn** | + | |
| - | - Assigned: 10/ | + | |
| - | - Deadline: 31/05/2019 | + | |
| - | - Link: https:// | + | |
| + | * An **oral exam**, that includes: (1) discussing the project report; (2) discussing topics presented during the classes, including the theory and practical exercises. | ||
| - | **Guidelines for the project | + | |
| - | ===== Exam DM part II (DMA) ====== | + | * **Dataset** |
| + | - Assigned: 18/ | ||
| + | - MidTerm Submission: 07/ | ||
| + | - Final Submission: one week before the oral exam (complete project required). | ||
| + | - Dataset: TBD | ||
| - | The exam is composed of three parts: | + | ** DM2 Project Guidelines ** |
| + | See TBD. | ||
| - | * A **written exam**, with exercises and questions about methods and algorithms presented during the classes. It can be substitute with the first and second mid-term tests of April and June. | ||
| - | * A small **online test** for the data ethics part. The test can be taken at the following link: [[https:// | ||
| - | * An **oral exam**, that includes: (1) discussing the project report with a group presentation; | ||
| - | * A **project** consists in exercises that require the use of data mining tools for analysis of data. Exercises include: sequential patterns, time series, classification (alternative methods and validation), | ||
| - | * **Time series**: given the 50+ years long history of stock values of a company, split it into years, and study their similarities, | ||
| - | * **Sequential patterns**: discover patterns over the stock value time series above. Before that, preprocess the data by splitting it into monthly time series and discretizing them in some way. **Objective**: | ||
| - | * **(Alternative) Classification methods**: test different classification methods over a simple classification problem. **Dataset**: | ||
| - | * **Outlier detection**: | ||
| - | ====== Appelli di esame ====== | + | ===== Past Exams ===== |
| + | * Past exams texts can be found in old pages of the course. Please do not consider these exercises as a unique way of testing your knowledge. Exercises can be changed and updated every year and will be published together with the slides of the lectures. | ||
| - | ===== Mid-term exams ===== | + | ===== Reading About the "Data Scientist" |
| - | ^ ^ Date ^ Hour ^ Place ^ Notes ^ Marks ^ | + | ** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/ |
| - | | DM1: First Mid-term 2018 | 30.10.2018 | 11-13 | Room C1, L1, N1 | Please, use the system for registration: | + | |
| - | | DM1: Second Mid-term 2018 | 18.12.2018| 11-13 | Room C1, L1, N1 | Please, use the system for registration: | + | |
| - | | DM2: First Mid-term 2019 | 04.04.2019 | 16-18 | Room A1, E | Please, use the system for registration: | + | |
| - | ===== Appelli regolari | + | //Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// |
| - | ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ | + | |
| - | |1.|16.01.2019| 14:00 - 18:00| Room E | | | | + | |
| - | |2.|06.02.2019| 14:00 - 18:00| Room E | | | | + | |
| - | ===== Appelli straordinari A.A. 2017/18 / Extra sessions A.A. 20167/18===== | + | * Data, data everywhere. The Economist, Feb. 2010 {{: |
| - | + | * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/ | |
| - | ^ Date ^ Time ^ Room ^ Notes ^ Results ^ | + | * Welcome to the yotta world. The Economist, Sept. 2011 {{: |
| + | * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http:// | ||
| + | * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http:// | ||
| + | * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{: | ||
| + | | ||
| + | * Towards Effective Decision-Making Through Data Visualization: | ||
| ====== Previous years ===== | ====== Previous years ===== | ||
| + | * [[dm_ds2024-25]] | ||
| + | * [[dm_ds2023-24]] | ||
| + | * [[dm.2022-23ds]] | ||
| + | * [[dm.2021-22ds]] | ||
| + | * [[dm.2020-21]] | ||
| + | * [[dm.2019-20]] | ||
| + | * [[dm.2018-19]] | ||
| * [[dm.2017-18]] | * [[dm.2017-18]] | ||
| * [[dm.2016-17]] | * [[dm.2016-17]] | ||
| Linea 334: | Linea 273: | ||
| * [[dm.2012-13]] | * [[dm.2012-13]] | ||
| * [[dm.2011-12]] | * [[dm.2011-12]] | ||
| - | * [[dm.2010-11]] | + | |
| - | * [[dm.2009-10]] | + | |
| - | * [[dm.2008-09]] | + | |
| - | * [[dm.2007-08]] | + | |
| - | * [[dm.2006-07]] | + | |
| - | * [[PhDWorkshop2011]] | + | |
| - | * [[SNA.Ingegneria2011]] | + | |
| - | * [[SNA.IMT.2011]] | + | |
| - | * [[MAINS.SANTANNA.2011-12]] | + | |
| - | * [[MAINS.SANTANNA.DM4CRM.2012]] | + | |
| - | * [[MAINS.SANTANNA.DM4CRM.2016]] | + | |
| - | * [[MAINS.SANTANNA.DM4CRM.2017 | Data Mining for Customer Relationship Management 2017]] | + | |
| - | * [[MAINS.SANTANNA.DM4CRM.2018]] | + | |
| - | * [[MAINS.SANTANNA.DM4CRM.2019]] | + | |
| - | * [[SDM2018 | Instructions for camera ready and copyright transfer]] | + | |
| - | * [[DM-SAM | Storie dell' | + | |
dm/start.1556531416.txt.gz · Ultima modifica: 29/04/2019 alle 09:50 (7 anni fa) da Mirco Nanni
