Strumenti Utente

Strumenti Sito


tdm:biss09

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
tdm:biss09 [08/03/2009 alle 13:08 (15 anni fa)]
Dino Pedreschi
tdm:biss09 [07/04/2009 alle 11:11 (15 anni fa)] (versione attuale)
Dino Pedreschi
Linea 9: Linea 9:
   * Co-author of the course: **Fosca Giannotti** (KDD LAB, ISTI-CNR, Pisa)  [[fosca.giannotti@isti.cnr.it]]   * Co-author of the course: **Fosca Giannotti** (KDD LAB, ISTI-CNR, Pisa)  [[fosca.giannotti@isti.cnr.it]]
  
-  * Acknowledgements to colleagues **Mirco Nanni** (KDDLAB, ISTI-CNR, Pisa) and **Francesco Bonchi** (Yahoo! Research, Barcelona)+  * Acknowledgements to colleagues **Vipin Kumar** (University of Minnesota), **Jiawei Han** (Univ. of Illinois at Urbana-Champaign), **Mirco Nanni** (KDDLAB, ISTI-CNR, Pisa)**Francesco Bonchi** (Yahoo! Research, Barcelona)
  
 ====== Summary ====== ====== Summary ======
  
-Since databases became a mature technology and massive collection and storage of data became feasible at increasingly cheaper costs, a push emerged towards powerful methods for discovering knowledge from those data, capable of going beyond the limitations of traditional statistics, machine learning and database querying. This is why data mining emerged as an important multi-disciplinary field. Data mining is the process of automatically discovering useful information in large data repositories. Often, traditional data analysis tools and techniques cannot be used because of the volume of data, such as point-of-sale data, Web logs, earth observation data from satellites, genomic data, location data from telecom service providers. Sometimes, the non-traditional nature of the data implies that ordinary data analysis techniques are not applicable. Today, data mining is both a technology that blends data analysis methods with sophisticated algorithms for processing large data sets, and an active research field that aims at developing new data analysis methods for novel forms of data. This course is aimed at providing a succinct account of the foundations of data mining, together with an overview of the most advanced topics and application areas, as well as the current frontiers of data mining research. First part of the course (Data mining - foundations) covers: the basic concepts, the knowledge discovery process, mining various forms of data (relational, transactional, object-relational, spatiotemporal, text, multimedia, web, etc), mining various forms of knowledge (classification, clustering, and frequent patterns), evaluation of knowledge, and key applications of data mining. The second part of the course (Data mining - advanced concepts and case studies) gives an introductory account of: sequential data mining, mining data streams, web mining, social network analysis, graph and network mining, spatiotemporal data and mobility data mining, privacy-preserving data mining, together with presentations of real-life case studies in various domains, including retail and market analysis, fiscal fraud detection, transportation and mobility.+Since databases became a mature technology and massive collection and storage of data became feasible at increasingly cheaper costs, a push emerged towards powerful methods for discovering knowledge from those data, capable of going beyond the limitations of traditional statistics, machine learning and database querying. This is why data mining emerged as an important multi-disciplinary field. Data mining is the process of automatically discovering useful information in large data repositories. Often, traditional data analysis tools and techniques cannot be used because of the volume of data, such as point-of-sale data, Web logs, earth observation data from satellites, genomic data, location data from telecom service providers. Sometimes, the non-traditional nature of the data implies that ordinary data analysis techniques are not applicable. Today, data mining is both a technology that blends data analysis methods with sophisticated algorithms for processing large data sets, and an active research field that aims at developing new data analysis methods for novel forms of data. This course is aimed at providing a succinct account of the foundations of data mining, together with an overview of the most advanced topics and application areas, as well as the current frontiers of data mining research. First part of the course (Data mining - foundations) covers: the basic concepts, the knowledge discovery process, mining various forms of data (relational, transactional, object-relational, spatiotemporal, text, multimedia, web, etc), mining various forms of knowledge (classification, clustering, and frequent patterns), evaluation of knowledge, and key applications of data mining. The second part of the course (Data mining - advanced concepts and case studies) gives an introductory account of the frontiers of data mining research: sequential data mining, mining data streams, web mining, social network analysis, graph and network mining, spatiotemporal data and mobility data mining, privacy-preserving data mining, together with presentations of real-life case studies in various domains, including retail and market analysis, fiscal fraud detection, transportation and mobility.
  
 ====== Reference  textbooks ====== ====== Reference  textbooks ======
Linea 21: Linea 21:
  
 Jiawei Han and Micheline Kamber. [[http://www.cs.uiuc.edu/homes/hanj/bk2/|Data Mining: Concepts and Techniques]], 2nd ed. Morgan Kaufmann Publishers, 2006. (slides downloadable) Jiawei Han and Micheline Kamber. [[http://www.cs.uiuc.edu/homes/hanj/bk2/|Data Mining: Concepts and Techniques]], 2nd ed. Morgan Kaufmann Publishers, 2006. (slides downloadable)
 +
 +Xindong Wu et al. [[http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf | Top 10 algorithms in data mining]]. Knowledge and Information Systems (2008) 14:1–37.
 +
 +
 ====== Lecture slides ====== ====== Lecture slides ======
  
Linea 26: Linea 30:
   * Data Mining & Knowledge Discovery {{:tdm:giannottipedreschi.mdmp.biss.09.pdf| Pedreschi}} {{:tdm:chap1_intro.pdf| Kumar, chapter 1}}   * Data Mining & Knowledge Discovery {{:tdm:giannottipedreschi.mdmp.biss.09.pdf| Pedreschi}} {{:tdm:chap1_intro.pdf| Kumar, chapter 1}}
   * Preprocessing & data exploration {{:tdm:chap2_data.pdf| Kumar, chapter 2}} {{:tdm:chap3_data_exploration.pdf| Kumar, chapter 3}}   * Preprocessing & data exploration {{:tdm:chap2_data.pdf| Kumar, chapter 2}} {{:tdm:chap3_data_exploration.pdf| Kumar, chapter 3}}
-  * Cluster analysis {{:tdm:chap8_basic_cluster_analysis.pdf| Kumar, chapter 8}} +  * Cluster analysis {{:tdm:chap8_basic_cluster_analysis.pdf| Kumar, chapter 8}}  {{:tdm:han.clustering.ppt| Han chapter 7}} 
-  * Classification {{:tdm:chap4_basic_classification.pdf| Kumar, chapter 4}} +  * Classification {{:tdm:chap4_basic_classification.pdf| Kumar, chapter 4}} {{:tdm:han.classification.ppt| Han chapter 6}} 
-  * Frequent patterns and association rules {{:tdm:chap6_basic_association_analysis.pdf| Kumar, chapter 6}}+  * Frequent patterns and association rules {{:tdm:chap6_basic_association_analysis.pdf| Kumar, chapter 6}} {{:tdm:han.frequentpatterns.ppt| Han chapter 5}}
  
 **Frontiers of Data Mining research** **Frontiers of Data Mining research**
Linea 34: Linea 38:
   * Privacy-preserving data mining {{:tdm:2._privacypreservingtechnologies_pedreschi_.pdf| Giannotti Pedreschi short tutorial}} {{:tdm:tutorialpakdd07-giannottipedreschi.pdf| Giannotti Pedreschi long tutorial}}   * Privacy-preserving data mining {{:tdm:2._privacypreservingtechnologies_pedreschi_.pdf| Giannotti Pedreschi short tutorial}} {{:tdm:tutorialpakdd07-giannottipedreschi.pdf| Giannotti Pedreschi long tutorial}}
   * Graph mining and complex network analysis {{:tdm:wma.sna.pedreschi.1.pdf| Pedreschi 1}} {{:tdm:wma.sna.pedreschi.2.pdf| Pedreschi 2}} {{:tdm:wma.sna.pedreschi.3.pdf| Pedreschi 3}} {{:tdm:wma.sna.pedreschi.4.pdf| Pedreschi 4}}   * Graph mining and complex network analysis {{:tdm:wma.sna.pedreschi.1.pdf| Pedreschi 1}} {{:tdm:wma.sna.pedreschi.2.pdf| Pedreschi 2}} {{:tdm:wma.sna.pedreschi.3.pdf| Pedreschi 3}} {{:tdm:wma.sna.pedreschi.4.pdf| Pedreschi 4}}
 +
 +
 +====== Students ======
 +
 +  - Aiello Luca Maria
 +  - Barbierato Enrico
 +  - Bosio Gianni
 +  - Camporesi Ferdinanda
 +  - Ferraioli Diodato
 +  - Ferreira Rui
 +  - Halder Raju
 +  - Kreautsevich Leanid
 +  - Leonardi Luca
 +  - Lutteri Emiliano
 +  - Madhavamandiram Rajan Deepak
 +  - Marengo Elisa
 +  - Mauro Jacopo
 +  - Mencagli Gabriele
 +  - Mezzetti Enrico
 +  - Muratori Ludovico Antonio
 +  - Nurrachmat Andi
 +  - Olivieri Chiara
 +  - Ottaviano Giuseppe
 +  - Panisson André
 +  - Panozzo Daniele
 +  - Pardini Luca
 +  - Peroni Silvio
 +  - Petrucci Andrea
 +  - Pomponiu Victor
 +  - Porreca Antonio Enrico
 +  - Pozzani Gabriele
 +  - Puech Matthias
 +  - Rama Aureliano
 +  - Rodolà Emanuele
 +  - Seraghiti Andrea
 +  - Spanò Alvise
 +  - Sugavam Swaminathan
 +  - Tolomei Gabriele
 +  - Triossi Andrea
 +  - Turroni Francesco
 +  - Vairo Claudio Francesco
 +  - Valsecchi Andrea
 +  - Vernero Fabiana
 +  - Vezzi Francesco
 +  - Visconti Alessia
 +  - Vitale Fabio
 +  - Zaccagnino Rocco
 +  - Zanioli Matteo
 +====== Exams ======
 +
 +The exam for this course consists of a term paper, reporting
 +  * a reasoned survey on a specific area of data mining research, or 
 +  * a project consisting either in the analytical experiment over a challenging dataset, or in the development of a data mining algorithm. 
 +
 +The exam can be conducted in teams, and should be preferably close to the research interest of the candidate, exploiting the interdisciplinary nature of data mining and knowledge discovery.
 +
 +The students willing to give the exam should send an email with subject [BISS09] to the instructor, specifying the chosen subject for the work, and the list of participants in the team. Once negotiated with the instructor, the assigned teamwork will be inserted in this wiki, were also the final report wil be published (in pdf format). The exam must be completed within 2009.
 +
 +----
 +
 +**Project assignments**
 +
 +  - Rocco Zaccagnino, Diodato Ferraioli (UniSA). Data Mining and Computer Music. Analisi (armonica, melodica e ritmica) di composizioni musicali, mediante l'estrazione di informazioni significative. Survey.
 +  - Emanuele Rodolà, Andrea Seraghiti, Andrea Petrucci (UniVE, UniVR). Application of LOF Method for Detecting Outliers in Range Scanner Datasets. Project.
 +  - Silvio Peroni (UniBO). Web page categorization via clustering. Project.
 +  - Gabriele Pozzani (UniVR). Spatio-temporal data mining. Survey.
 +  - Francesco Vezzi (UniUD). Data mining for bioinformatics. Survey.
 +  - Raju Halder, Luca Leonardi, Andrea Triossi, Matteo Zanioli. Feature detection in real-time frame-rate applications. Project.
 +  - Enrico Barbierato (UniTO). Implementazione di classificatore Naive Bayes / Bayesian Networks. Project.
 +  - Andrea Valsecchi, Antonio Enrico Porreca. Anti-spam filter based on Naive Bayes classification. Project.
 +  - Daniele Panozzo, Chiara Olivieri. Recent development in clustering techniques: Spectral and Kernel-Based Methods. Survey / project.
 +  - Francesco Turroni, Enrico Mezzetti, Jacopo Mauro, Ludovico Antonio Muratori. Multiclass text categorization with Support Vector Machines. Project.
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
 +
tdm/biss09.1236517736.txt.gz · Ultima modifica: 08/03/2009 alle 13:08 (15 anni fa) da Dino Pedreschi