====== Data Mining (309AA) - 9 CFU A.Y. 2024/2025 ======

**Instructors:**
* **Anna Monreale**
  * KDDLab, Università di Pisa
  * [[anna.monreale@unipi.it]]
* **Mattia Setzu**
  * KDDLab, Università di Pisa
  * [[mattia.setzu@unipi.it]]

**Teaching Assistant:**
* **Lorenzo Mannocci**
  * University of Pisa
  * [[lorenzo.mannocci@phd.unipi.it]]

====== Learning Goals ======

* Fundamental concepts of data knowledge and discovery.
* Data understanding
* Data preparation
* Clustering
* Classification
* Pattern Mining and Association Rules
* Outlier Detection
* Time Series Analysis
* Sequential Pattern Mining
* Ethical Issues

====== Schedule ======

**Classes**
^ Day of Week ^ Hour ^ Room ^
| Tuesday | 11:00 - 13:00 | Room C1 |
| Thursday | 14:00 - 16:00 | Room A1 |
| Friday | 09:00 - 11:00 | Room C1 |

**Office hours - Ricevimento:**
* Anna Monreale: TBD
* Mattia Setzu: Infos on [[https://unimap.unipi.it/cercapersone/dettaglio.php?ri=177323&template=dett_didattica.tpl|Unimap]]

A [[https://teams.microsoft.com/l/team/19%3Aq8IK5DrzMwEE5TxVhuw4QdYEVFJ06KVITI5jSJTmaJ81%40thread.tacv2/conversations?groupId=5fae2fa6-38fd-414f-a0c9-ffbd8e6f0710&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Teams Channel]] will be used ONLY to post news, Q&A, and other stuff related to the course. The lectures will be only in presence and will **NOT** be live-streamed, but recordings of the lecture or of the previous years will be made available here for non-attending students.

====== Teaching Material ======

**Books**
^ Title ^ Authors ^ Edition ^
| [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php|Introduction to Data Mining]] | Pang-Ning Tan, Michael Steinbach, Vipin Kumar | 2nd |
| [[https://link.springer.com/book/10.1007/978-3-031-48956-3|Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications]] | Laura Igual, Santi Seguí | 2nd |
| [[http://shop.oreilly.com/product/0636920034919.do| Python Data Science Handbook: Essential Tools for Working with Data]] | Jake VanderPlas | 1st |
| [[https://github.com/janishar/mit-deep-learning-book-pdf|Deep Learning]] | Ian Goodfellow, Yoshua Bengio, Aaron Courville | |
| [[https://math.mit.edu/~gs/linearalgebra/ila5/indexila5.html|Introduction to Linear Algebra]] | Gilbert Strang | 5th |

**Online tutorials**
^ ^ Authors ^
| [[https://brianmcfee.net/dstbook-site/content/intro.html|Digital Signals Theory]] | Brian McFee |
| [[https://rtavenar.github.io/blog/dtw.html|An introduction to Dynamic Time Warping]] | Romain Tavenard |
| [[https://github.com/msetzu/intro_to_ds_and_ml/blob/master/python/notebooks/Python.ipynb|Introduction to Python]] | Mattia Setzu |

**Slides**
The slides used in the course will be inserted in the calendar after each class. Some are part of the slides provided by the textbook's authors [[http://www-users.cs.umn.edu/~kumar/dmbook/index.php#item4|Slides per "Introduction to Data Mining"]].

**Software**
Software material available in the [[https://github.com/data-mining-UniPI/teaching23|Github repository]] (available in the coming days).

====== Class Calendar (2024/2025) ======

===== First Semester =====
^ ^ Day ^ Topic ^ Teaching material ^ References ^ Video Lectures ^ Teacher ^
| | 17.09 | Candeled | | | | |
|1. | 19.09 | Overview. Introduction to KDD | {{ :magistraleinformatica:dmi:1-overview-2024.pdf |}} {{ :magistraleinformatica:dmi:1-intro-dm-2024.pdf |}} |Chap. 1 Kumar Book | | Monreale|
|2. | 20.09 | Data Understanding + Data Preparation (Aggr., Sampling, Dim. Reduction, Feature Selection, Feature Creation). | {{ :magistraleinformatica:dmi:2-data_understanding-2024.pdf |}} {{ :magistraleinformatica:dmi:3-data_preparation-2024.pdf |}}|Chap.2 Kumar Book and additioanl resource of Kumar Book: [[https://www-users.cs.umn.edu/~kumar001/dmbook/data_exploration_1st_edition.pdf|Data Exploration Chap.]] If you have the first ed. of KUMAR this is the Chap 3 | |Monreale|
|3 | 24.09 | Data representation | Slides: {{ :magistraleinformatica:dmi:Data representation.pdf |}}. References: Introduction to linear algebra (Sections 1, 3.1, 4.2, 6.1, 6.4, 6.5, 7.3), [[https://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf|t-SNE paper]], [[https://arxiv.org/abs/1802.03426 | UMAP paper (Section 3)]] | | |Setzu |
|4. | 26.09 | Data Cleaning + Transformations. Python Lab: Data Understanding and Preparation |{{ :magistraleinformatica:dmi:5-data_cleaning_transformation.pdf |}} | | |Monreale |