Strumenti Utente

Strumenti Sito


dm:temp

Traccia secondo esercizio, DM2

  • Sequential pattern analysis: WarLogs Dataset. Assigned on: 02.04.2014. To be completed within: 21.04.2014. Send papers (3 pages max of text, figures excluded) by email to datamining [dot] unipi [at] gmail [dot] com. Use ”[DM] exercise 6” in the subject. Download the Dataset here in CVS format: warlogs.csv.zip. Description of the variables are here. Problem : Build a dataset of sequences that describe, for each day and for each geographical area, the sequence of events happened there. The geographical areas to adopt can be the same indicated in the “region” attribute already in the dataset, or they can be obtained by partitioning the territory in some other way, for instance to try to have more balanced areas. The events to consider can be, for instance, represented by the “category” or “type” attributes in the dataset, or they can be computed considering other informations (kind of casualties, number of wounded or killed victims, etc.). Use this dataset to extract a set of frequent sequential patterns. Tools for sequential patterns. Among possible alternatives, we suggest do adopt one of the following:
    • Weka: use the GeneralizedSequentialPatterns associator. The input dataset should contain, for each line, a pair <sequence ID><Event ID>, and the lines should be temporally ordered (there is no explicit timestamp in the data). Here is an example: sequence_data.csv.zip.
    • Spam: command-line tool, that can be downloaded here (binaries for Windows and Linux, including sample input file). Notice that the input should contain only numeric (integer) values, therefore some coding is needed. Also, input sequences longer than 64 transactions are not allowed, therefore they should be truncated.

Ignorare quanto è qui sotto

Day Aula Topic Learning material Instructor
1. 17.02.2014 9:00-11:00 N1 Introduction Giannotti
2. 19.02.2014 9:00-11:00 L1 Frequent patterns and association rules / 1 Giannotti
3. 24.02.2014 9:00-11:00 N1 Frequent patterns and association rules / 2 Giannotti
4. 26.02.2014 9:00-11:00 L1 Frequent patterns and association rules / 3 Giannotti
5. 3.03.2014 9:00-11:00 N1 Association rules on DM tools Giannotti
6. 5.03.2014 9:00-11:00 L1 Sequential patterns / 1 Nanni
7. 10.03.2014 9:00-11:00 N1 Sequential patterns / 2 Nanni
8. 12.03.2014 9:00-11:00 L1 Time series / 1 + Data exploration: assignments Nanni
9. 17.03.2014 9:00-11:00 N1 Time series / 2 Nanni
10. 19.03.2014 9:00-11:00 L1 Classification: evaluation methods + Case study: Fraud detection Giannotti
11. 24.03.2014 9:00-11:00 N1 Network diffusion and Virality Marketing Giannotti
12. 26.03.2014 9:00-11:00 L1 Mobility Data Mining / 1 Nanni
13. 7.04.2014 9:00-11:00 N1 Mobility Data Mining / 2 Nanni
14. 9.04.2014 9:00-11:00 L1 Case study: Mobility Data Mining Nanni
15. 14.04.2014 9:00-11:00 N1 Case study: Mobility Data Mining/2 Giannotti - Nanni
16. 16.04.2014 9:00-11:00 L1 Data exploration: results of assignments + Presentation of projects Nanni
17. 28.04.2014 9:00-11:00 N1 Data Mining and Privacy/1 Giannotti
18. 30.04.2014 9:00-11:00 L1 Case study: Mining official data ed health data Nanni
10. 5.05.2014 9:00-11:00 N1 Data Mining and Privacy/2 Giannotti
20. 7.05.2014 9:00-11:00 L1
21. 12.05.2014 9:00-11:00 N1
22. 14.05.2014 9:00-11:00 L1
23. 19.05.2014 9:00-11:00 N1
24. 21.05.2014 9:00-11:00 L1
25. 27.05.2014 9:00-11:00 N1
dm/temp.txt · Ultima modifica: 04/04/2014 alle 07:30 (10 anni fa) da Mirco Nanni