Strumenti Utente

Strumenti Sito


magistraleinformatica:dmi:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
magistraleinformatica:dmi:start [30/09/2021 alle 23:12 (4 anni fa)] – [First Semester] Anna Monrealemagistraleinformatica:dmi:start [18/12/2025 alle 14:09 (16 ore fa)] (versione attuale) Anna Monreale
Linea 1: Linea 1:
-<html> +====== Data Mining (309AA- 9 CFU A.Y2025/2026 ======
-<!-- Google Analytics --> +
-<script type="text/javascript" charset="utf-8"> +
-(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function()+
-(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), +
-m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) +
-})(window,document,'script','//www.google-analytics.com/analytics.js','ga');+
  
-ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true}); +**Instructors:**
-ga('personalTracker.require', 'linker'); +
-ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] ); +
-   +
-ga('personalTracker.require', 'displayfeatures'); +
-ga('personalTracker.send', 'pageview', 'ruggieri/teaching/dm/'); +
-setTimeout("ga('send','event','adjusted bounce rate','30 seconds')",30000);  +
-</script> +
-<!-- End Google Analytics --> +
-<!-- Global site tag (gtag.js) - Google Analytics --> +
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +
-<script> +
-  window.dataLayer = window.dataLayer || []; +
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date()); +
- +
-  gtag('config', 'G-LPWY0VLB5W'); +
-</script> +
-<!-- Global site tag (gtag.js) - Google Analytics --> +
-<script async src="https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W"></script> +
-<script> +
-  window.dataLayer = window.dataLayer || []; +
-  function gtag(){dataLayer.push(arguments);+
-  gtag('js', new Date()); +
- +
-  gtag('config', 'G-LPWY0VLB5W'); +
-</script> +
-<!-- Capture clicks --> +
-<script> +
-jQuery(document).ready(function(){ +
-  jQuery('a[href$=".pdf"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'PDFs', fname); +
-  }); +
-  jQuery('a[href$=".r"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Rs', fname); +
-  }); +
-  jQuery('a[href$=".zip"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'ZIPs', fname); +
-  }); +
-  jQuery('a[href$=".mp4"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-  jQuery('a[href$=".flv"]').click(function() { +
-    var fname = this.href.split('/').pop(); +
-    ga('personalTracker.send', 'event',  'DM', 'Videos', fname); +
-  }); +
-}); +
-</script> +
-</html> +
-====== Data Mining (309AA) - 9 CFU A.Y. 2021/2022 ====== +
- +
-**Instructor:**+
   * **Anna Monreale**   * **Anna Monreale**
     * KDDLab, Università di Pisa     * KDDLab, Università di Pisa
     * [[anna.monreale@unipi.it]]        * [[anna.monreale@unipi.it]]   
 +  * **Mattia Setzu**
 +    * KDDLab, Università di Pisa
 +    * [[mattia.setzu@unipi.it]]   
 +
 **Teaching Assistant:** **Teaching Assistant:**
-  * **Francesca Naretto** +  * * **Lorenzo Mannocci** 
-    * KDDLab, SNS, Pisa +    * University of Pisa 
-    * [[francesca.naretto@sns.it]]  +    * [[lorenzo.mannocci@di.unipi.it]]  
  
 ====== News ====== ====== News ======
-  * [23.09.2021** Please, fill this document: [[https://docs.google.com/spreadsheets/d/1YzHs_JSYPWYqnmkM7ccQc1WZSzGP7UsgxdBF-h5LcEA/edit?usp=sharing|Student-Lists anf Project groups]]. On Teams you can find instructions for GroupID ** +  * [18-11-2025]: Project deadline available: January 5th, 2026.  
-  * [06.09.2021] The first lecture of this course will take place on Thursday, 16 Sept 2021. +  * [23-09-2025]Please register yourself and your group for the project .Group registration available  [[https://docs.google.com/spreadsheets/d/1Xl8Hd-giIuJQw0x2NDkXjbGZ2REGF-OukqC5XGU6pzA/edit?gid=0#gid=0|here]]. 
-  * [08.09.2021]People that intend to attend the course online should use this link: https://teams.microsoft.com/l/team/19%3aWKvq4kg0XbKZ5pEeiZcarbBXPCYsTvTwMkKZs2PWiHA1%40thread.tacv2/conversations?groupId=aea1385b-6721-4d90-a169-c97f7d066eca&tenantId=c7456b31-a220-47f5-be52-473828670aa1    + 
 ====== Learning Goals ====== ====== Learning Goals ======
-     * Fundamental concepts of data knowledge and discovery+The Data Mining course tackles the analysis of large collections of dataand the extraction of information and patternsIt aims to explore core components of the Knowledge Discovery from Data (KDD) process, and focuses on: 
-     * Data understanding +  * Data understanding 
-     * Data preparation +  * Data cleaning, preparation, and transformation 
-     Clustering +  Data analysis: outlier detection and data representation 
-     Classification & Regression +  Data clustering 
-     * Pattern Mining and Association Rules +  * Pattern extraction: itemset, rules, association rules, and sequential patterns 
-     Outlier Detection +  Inference models: trees, and ensemble models 
-     * Time Series Analysis +  Responsible data use: privacy and interpretability
-     Sequential Pattern Mining +
-     * Ethical Issues+
  
-====== Hours and Rooms ======+====== Schedule ======
  
 **Classes** **Classes**
  
 ^  Day of Week  ^  Hour  ^  Room  ^  ^  Day of Week  ^  Hour  ^  Room  ^ 
-|  Wednesday |  14:00 - 16:00  |  Room C  - Online  |  +|  Tuesday   |  11:00 - 13:00  |  Room C  |  
-|  Thursday  |  14:00 - 16:00  |  Room C  - Online  |  +|  Wednesday |  14:00 - 16:00  |  Room C  |  
-|  Friday    |  09:00 - 11:00  |  Room A1 -  Online  +|  Thursday  |  14:00 - 16:00  |  Room A1  | 
  
  
  
 **Office hours - Ricevimento:** **Office hours - Ricevimento:**
-Anna Monreale: Wednesday: 11:00-13:00 online using Teams (Appointment by email) +  * Anna Monreale:TBDOnline using Teams or in my Office (Appointment by email) 
-Francesca NarettoMonday15:00-18:00 online using Teams (Appointment by email)+  * Mattia SetzuInfos on [[https://unimap.unipi.it/cercapersone/dettaglio.php?ri=177323&template=dett_didattica.tpl|Unimap]]
  
-  +A [[ https://teams.microsoft.com/l/team/19%3Ai_Ge38xXm8FdnepLNud6ddbz_OECbBPRKfA1UKbUsQo1%40thread.tacv2/conversations?groupId=41e56778-e965-462a-9fef-250df0ee7055&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]] will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed.
-====== Learning Material -- Materiale didattico ======+
  
-===== Textbook -- Libro di Testo ===== 
  
-  * Pang-Ning Tan, Michael Steinbach, Vipin Kumar. **Introduction to Data Mining**. Addison Wesley, ISBN 0-321-32136-7, 2006 +====== Teaching Material ======
-    * [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php]] +
-    * Chapters 4,6 and 8 are also available at the publisher's Web site. +
-  * Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. **GUIDE TO INTELLIGENT DATA ANALYSIS.** Springer Verlag, 1st Edition., 2010. ISBN 978-1-84882-259-7 +
-  * Laura Igual et al.** Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications**. 1st ed. 2017 Edition. +
-  *  Jake VanderPlas. **[[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data.]]** 1st Edition.  +
-  *  For Python Notions: {{ :magistraleinformatica:dmi:python_basics.ipynb.zip | Very basic notions on Python}} +
  
 +**Books**
 +^ Title ^ Authors ^ Edition ^
 +| [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php|Introduction to Data Mining]] | Pang-Ning Tan, Michael Steinbach, Vipin Kumar | 2nd |
 +| [[https://link.springer.com/book/10.1007/978-3-031-48956-3|Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications]] | Laura Igual,  Santi Seguí | 2nd |
 +| [[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data]] | Jake VanderPlas | 1st |
 +| [[https://github.com/janishar/mit-deep-learning-book-pdf|Deep Learning]] | Ian Goodfellow, Yoshua Bengio, Aaron Courville | |
 +| [[https://math.mit.edu/~gs/linearalgebra/ila5/indexila5.html|Introduction to Linear Algebra]] | Gilbert Strang | 5th |
  
-===== Slides ===== 
  
-  The slides used in the course will be inserted in the calendar after each class. Most of them are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]]. +**Online tutorials**
-   +
  
-   +^ ^ Authors ^ 
-===== Software=====+| [[https://brianmcfee.net/dstbook-site/content/intro.html|Digital Signals Theory]] | Brian McFee | 
 +| [[https://rtavenar.github.io/blog/dtw.html|An introduction to Dynamic Time Warping]] | Romain Tavenard | 
 +| [[https://github.com/msetzu/intro_to_ds_and_ml/blob/master/python/notebooks/Python.ipynb|Introduction to Python]] | Mattia Setzu |
  
-  * Python - Anaconda (3.7 version!!!): Anaconda is the leading open data science platform powered by Python. [[https://www.anaconda.com/distribution/| Download page]] (the following libraries are already included) 
-  * Scikit-learn: python library with tools for data mining and data analysis [[http://scikit-learn.org/stable/ | Documentation page]] 
-  * Pandas: pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. [[http://pandas.pydata.org/ | Documentation page]] 
  
-  +**Slides** 
-====== Class Calendar (2021/2020) ======+ 
 +The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]]. 
 + 
 +===== Past Excercises and past exams of similar courses  ===== 
 +  * Exercises on Clustering: {{ :dm:ex._clustering.pdf |}} 
 +  * Some text of past exams of a similar course: {{ :dm:2017-1-19.pdf |}}, {{ :dm:2017-9-6.pdf |}}, {{ :dm:2016-05-30-dm1-seconda.pdf |}}, {{ :dm:dm2_exam.2017.06.13_solutions.pdf |}}, {{ :dm:dm2_exam.2017.07.04_solutions.pdf |}}, {{ :dm:dm2_mid-term_exam.2017.06.06_solutions.pdf |}} 
 +  * Some exercises (partially with solutions) on **sequential patterns** and **time series** can be found in the following texts of exams from the last years: {{ :dm:dm2_exam.2015.04.13.results.pdf|}}, {{ :dm:dm2_exam.2016.04.4_sol.pdf |}}, {{ :dm:dm2_exam.2016.04.5_sol.pdf |}}, {{ :dm:dm2_exam.2016.06.20_sol.pdf |}}, {{ :dm:dm2_exam.2016.07.08_sol.pdf |}} 
 +   * Some very old exercises (part of them with solutions) are available here, most of them in Italian, not all of them on topics covered in this year program: {{tdm:verifica2006.pdf|Verifica 2006}}, {{tdm:verifica2005.pdf|Verifica 2005 (con soluzioni)}}, {{tdm:verifica2004.pdf|Verifica 2004}}, {{dm:verifica.05.06.2007.pdf|Verifica 5 giugno 2007}}, {{dm:verifica.26.06.2007.pdf|Verifica 26 giugno 2007}}, {{dm:verifica.24.07.2007_corretto.pdf|Verifica 24 luglio 2007}} (e {{:dm:soluzioni.2008.04.03.pdf|Soluzioni}}), {{:dm:dm-tdm.appello_2008_07_18_parte1.pdf|Verifica 18 luglio 2008 - parte 1}}, {{:dm:dm-tdm.appello_2008_07_18_parte2.pdf|Verifica 18 luglio 2008 - parte 2}},{{:dm:appello.2010.06.01_soluzioni.pdf| Exam with solution 2010-06-01}},{{:dm:appello.2010.06.22_soluzioni.pdf|Exam with solution 2010-06-22}}, {{:dm:appello.2010.09.09_soluzioni.pdf|Exam with solution 2010-09-09}},{{:dm:appello.2010.07.13_soluzioni.pdf| Exam with solution 2010-07-13}} 
 + 
 + 
 + 
 + 
 +     
 +====== Class Calendar (2025/2026) ======
  
 ===== First Semester  ===== ===== First Semester  =====
  
-^ ^ Day ^ Topic ^ Learning material ^ References ^ Video Lectures +^ ^ Day ^ Topic ^ Teaching material ^ References ^ Teacher 
-|  |  15.09  14:15‑16:00 Lecture deleted  | | | | +|1.  |  18.09  | Course Overview. Introduction to Data Mining |  {{ :magistraleinformatica:dmi:intro_dm.pdf |Introduction to DM}} | Chap. 1 Kumar Book | Setzu   
-|1.|  16.09  14:15‑16:00 Overview. Introduction to KDD  | {{ :magistraleinformatica:dmi:2021-1-overview.pdf |}}{{ :magistraleinformatica:dmi:1-intro-dm.pdf |}} | Chap. Kumar Book[[https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210916_140839-Meeting%20Recording.mp4?web=1|Video 1]]  [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Meeting%20in%20_General_-20210916_151538-Meeting%20Recording.mp4?web=1|Video 2]] | +    23.09  | Canceled for Teacher's health issues           |  | |  
-|2.|  17.09   09:00-10:45 | Data Understanding | {{ :magistraleinformatica:dmi:2-data_understanding.pdf | Slides DU}} |Chap.2 Kumar Book and additioanl resource of Kumar Book:[[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Exploring Data]] If you have the first edof KUMAR this is the Chap 3 [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210917_071017-Meeting%20Recording.mp4|Video 1]]  [[ https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210917_101809-Meeting%20Recording.mp4|Video 2]] +|2 |  24.09  | Data Understanding + Data Preparation        | {{ :magistraleinformatica:dmi:data_understanding.pdf |}} {{ :magistraleinformatica:dmi:data_preparation_and_cleaning.pdf | Data Preparation}}| Chap. Kumar Book and additioanl resource of Kumar Book: [[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Data Exploration Chap.]] If you have the first ed. of KUMAR this is the Chap 3 |Setzu | 
-|3.|  22.09  14:15-16:00 Data Understanding Data Preparation        | {{ :magistraleinformatica:dmi:3-data_preparation.pdf |}} | Chap2 Kumar Book | [[https://unipiit.sharepoint.com/sites/a__td_50479/Shared%20Documents/General/Recordings/Data%20Mining%20Lecture-20210922_120312-Meeting%20Recording.mp4|Video]] | +|3.  |  25.09  | Data representation      |{{ :magistraleinformatica:dmi:data_representation.pdf |}} | References: Introduction to linear algebra (Sections 1, 3.1, 4.2, 6.1, 6.4, 6.5, 7.3), [[https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf|t-SNE paper]], [[https://arxiv.org/abs/1802.03426 UMAP paper (Section 3)]]  |Setzu 
-|4.|  23.09  14:15-16:00 Data Preparation + Data Similarities.|{{ :magistraleinformatica:dmi:4-data_similarity.pdf |}}       | Data Similarity is in Chap | +|4 |  30.09  | Data Cleaning + Transformations. PyLabData Understanding     | {{ :magistraleinformatica:dmi:5-data_cleaning_transformation.pdf | Data Cleaning & Transformations}}| | Monreale, Mannocci | 
-|5.|  24.09  09:00-10:45 Introduction to ClusteringCenter-based clusteringkmeans| {{ :magistraleinformatica:dmi:5-basic_cluster_analysis-intro.pdf |}}  {{ :magistraleinformatica:dmi:6.1-basic_cluster_analysis-kmeans.pdf |}}     Clustering is in Chap | +|5.  |  01.10  PyLab: Data Understanding + Preparation    |{{ :magistraleinformatica:dmi:1_basics_and_understanding.ipynb.zip |}} {{ :magistraleinformatica:dmi:2_feature_engineering_and_data_representation.ipynb.zip |}} {{ :magistraleinformatica:dmi:data_notebook.zip |}}| | Monreale, Mannocci | 
-|6.|  29.09  14:15-16:00 | Hierarchical clustering       | {{ :magistraleinformatica:dmi:7.basic_cluster_analysis-hierarchical.pdf |}} | Chap7 Kumar Book |  | +|6.  |  02.10  | Similarities + Introduction to Clustering and Centroid-based clustering  | {{ :magistraleinformatica:dmi:6-data_similarity.pdf |}} {{ :magistraleinformatica:dmi:6-basic_cluster_analysis-intro.pdf |}} {{ :magistraleinformatica:dmi:8-basic_cluster_analysis-kmeans.pdf |}}| | Monreale | 
-|7.|  30.09  14:15-16:00 Density based clusteringClustering validityLabDU | {{ :magistraleinformatica:dmi:8.basic_cluster_analysis-dbscan-validity.pdf |}}   {{ :magistraleinformatica:dmi:du.zip |Notebooks on Data Understanding}}   Chap7 Kumar Book  +|7.  |  07.10  | K-means   | {{:magistraleinformatica:dmi:8-basic_cluster_analysis-kmeans.pdf |}}}| | Monreale | 
-|8.|  01.10  09:00-10:45 Python Lab DU Clustering      |+|8.  |  08.10  | Hierarchical Clustering + Density Based Clustering + Validity   | {{ :magistraleinformatica:dmi:9-basic_cluster_analysis-hierarchical.pdf |}} {{ :magistraleinformatica:dmi:8.basic_cluster_analysis-dbscan-validity.pdf |}} |  | Monreale | 
 +| 9. | 14.10 | Clustering evaluation and Python notebooks | {{ https://didawiki.cli.di.unipi.it/lib/exe/fetch.php/magistraleinformatica/dmi/12-basic_cluster_analysis-validity.pdf | Clustering validation}} {{ :magistraleinformatica:dmi:3_clustering.ipynb.zip |}} | | Setzu, Mannocci | 
 +| 10. | 15.10 | Anomaly detection | {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/anomaly%20detection/Anomaly%20detection.html.pdf | Slides }} | | Setzu | 
 +| 11. | 16.10 | Anomaly detection | {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/anomaly%20detection/Anomaly%20detection.html.pdf Slides }}, {{ https://github.com/data-mining-UniPI/teaching25/blob/main/notebooks/outliers.ipynb | Notebook }}, {{ https://github.com/data-mining-UniPI/teaching25/blob/main/notebooks/isolation_forest.py | Rule extraction from isolation forests }} | | Setzu | 
 +|12.  |  21.10  | Variants of K-means + Association Rule Mining | {{ :magistraleinformatica:dmi:11-basic_cluster_analysis-kmeans-variants.pdf |}} {{ :magistraleinformatica:dmi:17_association_analysis2023.pdf |}} | | Monreale  |  
 +|13 |  22.10  | Association Rule Mining: Apriori | {{ :magistraleinformatica:dmi:17_association_analysis2023.pdf |}} | | Monreale  |  
 +|14.  |  23.10  | Association Rule MiningCORELS | {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/rule_mining/Rule%20extraction.html.pdf | Slides }}, {{ https://corels.cs.ubc.ca/corels/index.html Online tool }} | | Setzu  |  
 +|15.  |  28.10  | Visual Analytcs  | {{ :magistraleinformatica:dmi:dm_intro_dataviz_vegaaltair.pdf |Slides}} {{ :magistraleinformatica:dmi:1_bis_basics_and_understanding_altair.ipynb.zip | Code for data visualization with Altair}}| |Monreale, Rinzivillo| 
 +|16.  |  29.10  | Association Rule Mining: FP-Growth Sequential Pattern Mining| {{ :magistraleinformatica:dmi:17_2023-fp-growth.pdf |FP-Growth}}{{ :magistraleinformatica:dmi:18_sequential_patterns_2024.pdf |SPM}}| |Monreale| 
 +|      30.10  Lecture is canceled| | | | 
 +|17.  |  04.11  | Sequential Pattern Mining with time constraints + Python Lab: FPM + SPM.| For SPM the same set of slides used in the previous lecture {{ :magistraleinformatica:dmi:5_patternmining.ipynb.zip |}} | | Monreale| 
 +|18.  |  05.11  | Supervised learning and classification | {{ :magistraleinformatica:dmi:supervisinglearning.pdf | Slides}}| | Setzu | 
 +|19.  |  06.11  | Classification: Decision Trees | {{ :magistraleinformatica:dmi:2025-dt_classification.pdf |Decision Trees }} [[https://unipiit.sharepoint.com/:v:/s/a__td_69096/ESuyvNgtPWxPoLRspBH9q3IB2cvZE9o6a0DRZFQP2gbNww?e=VqxUqL|Video]] | | Monreale 
 +|20 |  07.11  | ClassificationDecision Trees  | | Monreale | 
 +|21  11.11  | Classification: Decision Trees & evaltuation +  Decision Rules| {{ :magistraleinformatica:dmi:classificationmodelevaluation-2025.pdf |Evaluation}} {{ :magistraleinformatica:dmi:10-rule-based-classifiers.pdf | Decision Rules}}  | | Monreale 
 +|22 |  12.11  | ClassificationDecision Rules  + Instance based methods + Q&A for Project work| {{ :magistraleinformatica:dmi:10-knn.pdf |}} | | Monreale | 
 +|23  13.11  | Exercises: DT simulation, CLustering, sequences | {{ :magistraleinformatica:dmi:dt-learning-simulation.pdf |}} {{ :magistraleinformatica:dmi:learnedtree.pdf |}}{{ :magistraleinformatica:dmi:2025-ex-clustering.pdf |}} {{ :magistraleinformatica:dmi:ex-sequences.pdf |}}| | Monreale | 
 +|24.  |  18.11  | Advanced Decision Trees, GAMs, and ensemble models | {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/machine%20learning/Supervised%20tasks.html.pdf | Slides }} | | Setzu | 
 +|25.  |  25.11  | Neural networks | {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/machine%20learning/Networks.pdf | Slides }} | | Setzu 
 +|26 |  26.11  | Time series, Python Supervised Learning & Imbalanced Scenarios | {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/time%20series/Time%20series.html.pdf | Slides }} {{ :magistraleinformatica:dmi:supervised_learning.zip |}} {{ :magistraleinformatica:dmi:data_notebook.zip |}} | | Setzu, Mannocci | 
 +|27.  |  27.11  | Time series, Python Supervised Learning & Imbalanced Scenarios | {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/time%20series/Time%20series.html.pdf | Slides }}, {{ https://github.com/data-mining-UniPI/teaching25/blob/lectures/time%20series/Time%20series.html Slides in HTML (w/ working animation) }} | | Setzu | 
 +|28  02.12  | Shapelet-based Classification, Motif discovery | {{ :magistraleinformatica:dmi:23_time_series_motif-shapelets2023.pdf |Slides}} | {{ :magistraleinformatica:dmi:shaplet.pdf |}} {{ :magistraleinformatica:dmi:matrixprofile.pdf |}} [[https://www.cs.ucr.edu/~eamonn/MatrixProfile.html|Papers and resourse on motif]] |Monreale 
 +|29 |  03.12  | Py: Time Series|{{ :magistraleinformatica:dmi:timeseries.zip |}}| | Monreale, Mannocci | 
 +|30.   04.12  | Responsible AI: introduction and EU Regulations | {{ :magistraleinformatica:dmi:19_rai_privacy2025.pdf | Slides}}|Monreale | 
 +|31.  |  09.12  | Responsible AIprivacy. | Same slides of previous lecture | {{ :magistraleinformatica:dmi:chap-anonymity.pdf |}} [[https://arxiv.org/abs/1610.05820|MIA attack against ML]]| Monreale| 
 +|32  10.12  | Responsible AI: Explaianble AI |{{ :magistraleinformatica:dmi:20_explainability_2025.pdf |XAI}}|[[https://christophm.github.io/interpretable-ml-book/|Digital book where students can find some basic XAI models and notions]] {{ :magistraleinformatica:dmi:xai-taxonomy-survey.pdf | XAI Survey describing the taxonony and dimensions of XAI}} {{ :magistraleinformatica:dmi:lore-j.pdf | LORE apaproach}}, {{ :magistraleinformatica:dmi:abele-approach.pdf |ABELE approach}}{{ :magistraleinformatica:dmi:lasts_-_explaining_any_time_series_classifier_2_.pdf |LASTS}} [[https://arxiv.org/abs/1705.07874|SHAP]][[https://arxiv.org/abs/1602.04938|LIME]]|Monreale
 +|33 |  11.12  | XAI Python Notebook + Private and explanable FL, Assessing privacy in XAI  | {{ :magistraleinformatica:dmi:xai-tutorial.ipynb.zip |XAI Notebook}} {{ :magistraleinformatica:dmi:11-dic-2025-xai.pdf Slides}} |{{ :magistraleinformatica:dmi:glor-flex_local_to_global_rule-based_explanations_fl.pdf |GLOR-FLEX}} {{ :magistraleinformatica:dmi:fastshap-ex-pri.pdf |FASTSHAP++}} [[https://www.tdp.cat/issues21/tdp.a534a24.pdf|REVEAL]]|Naretto| 
 +|34.  |  16.12  |Project Presentations - second check - ONLINE - **MANDATORY **| 
 +|35.  |  17.12  |Project Presentations - second check - ONLINE - **MANDATORY **| 
 +|36.  |  18.12  |Project Presentations - second check - ONLINE - **MANDATORY **| 
 +====== Exam ====== 
 + 
 +The exam can be taken in one of two ways:
  
 +**Project track**: 
 +  * Project (70% of the final score) to be delivered after the end of the course
 +  * Oral exam (30% of the final score)
 +During the course, you will have some “Project presentation” sessions wherein you’ll briefly (~3 minutes) present your work, and receive feedback from the lecturers. These sessions do not contribute to your grade.
  
-====== Exams ====== +**Written test track** 
-**Mid-term Project **+  * Written exam (70% of the final score): to be delivered after the end of the course during the exam sessions and can include both theoretical questions and exercises. 
 +  * Oral exam (30% of the final score) 
 +Note that a passing grade for the project/written exam is required to be admitted to the oral exam.
  
 +**Project Guidelines:** 
 A project consists in data analyses based on the use of data mining tools.  A project consists in data analyses based on the use of data mining tools. 
-The project has to be performed by a team of 2/3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 20 pages of text including figures. The students must deliver both: paper (single column) and  well commented Python Notebooks.+The project has to be performed by a team of 3 students. It has to be performed by using Python. The guidelines require to address specific tasks. Results must be reported in a unique paper. The total length of this paper must be max 25 pages of text including figures. The students must deliver both: paper (single column) and  well commented Python Notebooks.
  
-  +Specifically, if any of these tasks appear in the project track, make sure to focus on the following:
-** Paper Presentation (OPTIONAL)**+
  
-Students need to present a research paper (made available by the teacher) during the last week of the course. This presentation is OPTIONAL: Students that decide to do the paper presentation can avoid the oral exam with open questions. They only need to present the project (see next point).+**Data understanding** 
 +  * An analysis of all variables, their relations, distributions, and quality 
 +  * An eventual feature imputation and/or selection 
 +  * The engineering of additional features, including the aforementioned analyses
  
-**Oral Exam** +**Clustering Analysis** 
-  * **Project presentation** (with slides) – 10 minutes: mandatory for all the students +  * A properly justified feature selection phase 
-  * ** Open questions ** on the entire program: optional only for students opting for paper presentation. +  Tackling all clusternig families, exploring their respective hyperparameters 
-  +  An analysis of the best clusterings per family, including cluster description 
- +  * A comparison of the best clusterings per family
  
 +**Anomaly detection**
 +  * A selection of outliers through appropriate algorithms
 +  * An interpretation of such outliers
 +  * An analysis of the impact of the outliers on the previously performed data understanding
 +
 +**Time series analysis**
 +  * Appropriate representation choice for the task at hand
 +
 +**Supervised learning**
 +  * Feature selection
 +  * Test different families of models
 +  * Proper model validation, including both model performance and model complexity
 +  * Comparison of the best models of each family
 +
 +**Explainability**
 +
 +  * Justified selection of instances to explain
 +  * Analysis of the explanations
 +
 +**Project and Deadlines** 
 +Information about the dataset to be analyzed and project description:
 +  * **Dataset.** https://drive.google.com/file/d/1K9garfm03-PFUMYyOenH9kqEJ7D5RrmD/view?usp=sharing
 +  * **Project description.** {{ :magistraleinformatica:dmi:data_mining_project.pdf |}}
 +  * **Project description Task 4.** {{ :magistraleinformatica:dmi:data_mining_project2.pdf |}}
 +  * **Dataset Task 4.** https://drive.google.com/file/d/1Li2roWMoREN6_nKy-trB7pXWDA1xkAzh/view?usp=sharing
 +  * ** Project description Task 5. 
 +  * **Project Question & Answers.{{ :magistraleinformatica:dmi:25-26-data_mining_project_includingt5.pdf |Complete Project Description}}
 +  * **Deadline.** January 5th, 2026.
 +  * **Delivery instructions.** The final deadline of the project is **5th January 2026 at 23:59**. This deadline is **STRICT**. No extension is possible because then the winter session of exams starts. **Groups that will not deliver the project by 5th January will need to do the written exam during the exam sessions.** Each group must deliver by email to anna.monreale@unipi.it, mattia.setzu@unipi.it, lorenzo.mannocci@di.unipi.it a zipped folder named DM_GroupID.zip and containing 4 folders and 1 pdf file: a folder named DM_GroupID_TASK1, containing source code of data understanding; a folder named DM_GroupID_TASK2, containing source code of data clustering; a folder named DM_GroupID_TASK3, containing source code of classification and explanation analysis; a folder named DM_GroupID_TASK4, containing source code of time series analysis; a pdf file with maximum 25+2 pages including figures discussing the results of the tasks (25 pages for tasks 1-4 and 2 pages for task 5). The name of this file must be: DM_Report_GroupID.pdf. The file must contain the list of authors (i.e., members of the group). **The subject of the email must be “DMProject25_GroupID”**
 +  * **How to book for the exam colloquium?** In https://esami.unipi.it/ you can find the dates for the exam: one for January and one for February. Each student must do the registration on one of the 2 dates. These are not the dates of the colloquium but we will use the list of registered students for organizing the exam dates. We will share with you a calendar for the oral exam.
  
-===== Reading About the "Data Scientist" Job ===== 
  
-** ... a new kind of professional has emerged, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data. Hal Varian, Google’s chief economist, predicts that the job of statistician will become the "sexiest" around. Data, he explains, are widely available; what is scarce is the ability to extract wisdom from them. ** 
  
-//Data, data everywhere. The Economist, Special Report on Big Data, Feb. 2010.// 
  
-  * Data, data everywhere. The Economist, Feb. 2010 {{:dm:economist--010.pdf|download}} 
-  * Data scientist: The hot new gig in tech, CNN & Fortune, Sept. 2011 [[http://tech.fortune.cnn.com/2011/09/06/data-scientist-the-hot-new-gig-in-tech/|link]] 
-  * Welcome to the yotta world. The Economist, Sept. 2011 {{:dm:economist-2012-dm.pdf|download}} 
-  * Data Scientist: The Sexiest Job of the 21st Century. Harvard Business Review, Sept 2012 [[http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1|link]] 
-  * Il futuro è già scritto in Big Data. Il SOle 24 Ore, Sept 2012 [[http://www.ilsole24ore.com/art/tecnologie/2012-09-21/futuro-scritto-data-155044.shtml?uuid=AbOQCOhG|link]] 
-  * Special issue of Crossroads - The ACM Magazine for Students - on Big Data Analytics {{:dm:crossroadsxrds2012fall-dl.pdf|download}} 
-  * Peter Sondergaard, Gartner, Says Big Data Creates Big Jobs. Oct 22, 2012: [[https://www.youtube.com/watch?v=mXLy3nkXQVM|YouTube video]] 
  
-  * Towards Effective Decision-Making Through Data Visualization: Six World-Class Enterprises Show The Way. White paper at FusionCharts.com. [[http://www.fusioncharts.com/whitepapers/downloads/Towards-Effective-Decision-Making-Through-Data-Visualization-Six-World-Class-Enterprises-Show-The-Way.pdf|download]] 
  
 ====== Previous years ===== ====== Previous years =====
 +[[DM-INF 2024-2025]]
 +
 +[[DM-INF 2023-2024]]
 +
 +[[DM-INF 2022-2023]]
 +
 +[[DM-INF 2021-2022]]
 +
 [[DM-INF 2020-2021]] [[DM-INF 2020-2021]]
  
magistraleinformatica/dmi/start.1633043521.txt.gz · Ultima modifica: 30/09/2021 alle 23:12 (4 anni fa) da Anna Monreale

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki