====== Data Mining (309AA) - 9 CFU A.Y. 2024/2025 ====== **Instructors:** * **Anna Monreale** * KDDLab, Università di Pisa * [[anna.monreale@unipi.it]] * **Mattia Setzu** * KDDLab, Università di Pisa * [[mattia.setzu@unipi.it]] **Teaching Assistant:** * * **Lorenzo Mannocci** * University of Pisa * [[lorenzo.mannocci@phd.unipi.it]] ====== News ====== * [21.09.2024] ** Schedule updated, see details below** * [14.09.2024] ** The lectures will start on 19th September 2024** ====== Learning Goals ====== * Fundamental concepts of data knowledge and discovery. * Data understanding * Data preparation * Clustering * Classification * Pattern Mining and Association Rules * Outlier Detection * Time Series Analysis * Sequential Pattern Mining * Ethical Issues ====== Schedule ====== **Classes** ^ Day of Week ^ Hour ^ Room ^ | Tuesday | 11:00 - 13:00 | Room C1 | | Thursday | 14:00 - 16:00 | Room A1 | | Friday | 09:00 - 11:00 | Room C1 | **Office hours - Ricevimento:** * Anna Monreale: TBD * Mattia Setzu: Infos on [[https://unimap.unipi.it/cercapersone/dettaglio.php?ri=177323&template=dett_didattica.tpl|Unimap]] A [[https://teams.microsoft.com/l/team/19%3Aq8IK5DrzMwEE5TxVhuw4QdYEVFJ06KVITI5jSJTmaJ81%40thread.tacv2/conversations?groupId=5fae2fa6-38fd-414f-a0c9-ffbd8e6f0710&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]] will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students. ====== Teaching Material ====== **Books** ^ Title ^ Authors ^ Edition ^ | [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php|Introduction to Data Mining]] | Pang-Ning Tan, Michael Steinbach, Vipin Kumar | 2nd | | [[https://link.springer.com/book/10.1007/978-3-031-48956-3|Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications]] | Laura Igual, Santi Seguí | 2nd | | [[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data]] | Jake VanderPlas | 1st | | [[https://github.com/janishar/mit-deep-learning-book-pdf|Deep Learning]] | Ian Goodfellow, Yoshua Bengio, Aaron Courville | | | [[https://math.mit.edu/~gs/linearalgebra/ila5/indexila5.html|Introduction to Linear Algebra]] | Gilbert Strang | 5th | **Online tutorials** ^ ^ Authors ^ | [[https://brianmcfee.net/dstbook-site/content/intro.html|Digital Signals Theory]] | Brian McFee | | [[https://rtavenar.github.io/blog/dtw.html|An introduction to Dynamic Time Warping]] | Romain Tavenard | | [[https://github.com/msetzu/intro_to_ds_and_ml/blob/master/python/notebooks/Python.ipynb|Introduction to Python]] | Mattia Setzu | **Slides** The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]]. **Software** Software material available in the [[https://github.com/data-mining-UniPI/teaching23|Github repository]] (available in the coming days). ====== Class Calendar (2024/2025) ====== ===== First Semester ===== ^ ^ Day ^ Topic ^ Teaching material ^ References ^ Video Lectures ^ Teacher ^ | | 17.09 | Candeled | | | | |1. | 19.09 | Overview. Introduction to KDD | {{ :magistraleinformatica:dmi:1-overview-2024.pdf |}} {{ :magistraleinformatica:dmi:1-intro-dm-2024.pdf |}} |Chap. 1 Kumar Book | | Monreale| |2. | 20.09 | Data Understanding + Data Preparation (Aggr., Sampling, Dim. Reduction, Feature Selection, Feature Creation). | {{ :magistraleinformatica:dmi:2-data_understanding-2024.pdf |}} {{ :magistraleinformatica:dmi:3-data_preparation-2024.pdf |}}|Chap.2 Kumar Book and additioanl resource of Kumar Book: [[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Data Exploration Chap.]] If you have the first ed. of KUMAR this is the Chap 3 | |Monreale| |3 | 24.09 | Data representation | Slides: {{ :magistraleinformatica:dmi:Data representation.pdf |}}. References: Introduction to linear algebra (Sections 1, 3.1, 4.2, 6.1, 6.4, 6.5, 7.3), [[https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf|t-SNE paper]], [[https://arxiv.org/abs/1802.03426 | UMAP paper (Section 3)]] | | |Setzu | |4. | 26.09 | Data Cleaning + Transformations. Python Lab: Data Understanding and Preparation |{{ :magistraleinformatica:dmi:5-data_cleaning_transformation.pdf |}} | | |Monreale | ====== Exams ====== TBD ====== Previous years ===== [[DM-INF 2023-2024]] [[DM-INF 2022-2023]] [[DM-INF 2021-2022]] [[DM-INF 2020-2021]] [[http://didawiki.cli.di.unipi.it/doku.php/dm/dm.2019-20|DM-2019/20]]