Strumenti Utente

Strumenti Sito


magistraleinformatica:ir:ir16:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisione Revisione precedente
Prossima revisione
Revisione precedente
magistraleinformatica:ir:ir16:start [29/11/2016 alle 09:12 (8 anni fa)]
Paolo Ferragina
magistraleinformatica:ir:ir16:start [05/09/2017 alle 13:10 (7 anni fa)] (versione attuale)
Paolo Ferragina [Exam]
Linea 24: Linea 24:
 ===== Exam ===== ===== Exam =====
  
-The exam will consist of a written testplus an oral discussion on the exercises. +The exam will consist of two parts: (1) a lab test on the libraries and tools learned in class; (2) a written test plus an oral discussion on the exercises. 
  
 ^ Date         ^ Room ^ Text ^ ^ Date         ^ Room ^ Text ^
-| 12/01/2017 |  L1 (9:00)  | text | +| 10/01/2017 |  L1 (9:30)  | {{:magistraleinformatica:ir:ir16:ir170110_lab.docx|Lab test}}, {{:magistraleinformatica:ir:ir16:papers.xml.gz|data}} | 
-03/02/2017 |  L1 (9:00)  | text | +| 12/01/2017 |  L1 (9:00)  | {{:magistraleinformatica:ir:ir16:ir170112.docx|text}} 
- +01/02/2017 |  L1 (9:30)  | {{:magistraleinformatica:ir:ir16:ir170201_lab.docx|Lab test}} | 
 +| 12/06/2017 |  L1 (15:00)  | {{ :magistraleinformatica:ir:ir16:ir170612.docx |text + written test on Lucene}} 
 +| 29/06/2017 |  A (9:00)  | {{ :magistraleinformatica:ir:ir16:ir170629.docx |text + written test on Lucene}} | 
 +| 27/07/2017 |  L1 (9:00)  | {{ :magistraleinformatica:ir:ir16:ir170727.docx |text + written test on Lucene}} | 
 +| 05/09/2017 |  N1 (15:00)  | {{ :magistraleinformatica:ir:ir16:ir170905.docx |text}} |
 =====  Books ===== =====  Books =====
  
Linea 48: Linea 51:
 | 11/10/16 | Exact-duplicate documents: Karp-Rabin's rolling hash (with properties and error probability). Introduction to Locality-Sensitive Hashing (LSH): similarity problem among users based on binary/real features, cosine-similarity among vectors of real-features, near-document similarity via shingles + min-wise hashing + Jaccard similarity. | {{:magistraleinformatica:ir:ir16:lect_06-lsh.ppt|Slides}}.\\ Sect 19.6 of [MRS] |  | 11/10/16 | Exact-duplicate documents: Karp-Rabin's rolling hash (with properties and error probability). Introduction to Locality-Sensitive Hashing (LSH): similarity problem among users based on binary/real features, cosine-similarity among vectors of real-features, near-document similarity via shingles + min-wise hashing + Jaccard similarity. | {{:magistraleinformatica:ir:ir16:lect_06-lsh.ppt|Slides}}.\\ Sect 19.6 of [MRS] | 
 | 12/10/16 | Locality-sensitive hashing: basics, hamming distance, Jaccard similarity, cosine-similarity, sketch of the main theorem. |  |  | 12/10/16 | Locality-sensitive hashing: basics, hamming distance, Jaccard similarity, cosine-similarity, sketch of the main theorem. |  | 
-| 18/10/16 | Posting list compression, codes: gamma, variable bytes, PForDelta and Elias-Fano. | {{:magistraleinformatica:ir:ir16:lect_07a-compression_integers.ppt|Slides}}.\\ Sez 5.3 of [MRS] and {{:magistraleinformaticanetworking/ae/ae2014/chap_9.pdf|Ferragina's notes}} (only the coders presented in class).  |+| 18/10/16 | Posting list compression, codes: gamma, delta, variable bytes, PForDelta and Elias-Fano. | {{:magistraleinformatica:ir:ir16:lect_07a-compression_integers.ppt|Slides}}.\\ Sez 5.3 of [MRS] and {{:magistraleinformaticanetworking/ae/ae2014/chap_9.pdf|Ferragina's notes}} (only the coders presented in class).  |
 | 19/10/16 | Rank and Select data structures, two approaches: the case of B untouched and extra o(B) bits, and the case of Elias-Fano's approach with B compressed. Succinct representation of binary trees and its navigational operations (heap like notation). | {{:magistraleinformatica:ir:ir16:lect_7b-succinct-tree.pptx|Slides}}.  | | 19/10/16 | Rank and Select data structures, two approaches: the case of B untouched and extra o(B) bits, and the case of Elias-Fano's approach with B compressed. Succinct representation of binary trees and its navigational operations (heap like notation). | {{:magistraleinformatica:ir:ir16:lect_7b-succinct-tree.pptx|Slides}}.  |
 | 25/10/16 | Exact search: hashing with chaining, univeral hashing, cuckoo hashing. Prefix search: compacted trie, front coding, 2-level indexing. Edit distance via brute-force approach, or Dynamic Programming (possibly weighted). | {{:magistraleinformatica:ir:ir16:lect_08-dict_search_-_part_a.ppt|Slides}}. | | 25/10/16 | Exact search: hashing with chaining, univeral hashing, cuckoo hashing. Prefix search: compacted trie, front coding, 2-level indexing. Edit distance via brute-force approach, or Dynamic Programming (possibly weighted). | {{:magistraleinformatica:ir:ir16:lect_08-dict_search_-_part_a.ppt|Slides}}. |
Linea 60: Linea 63:
 | 29/11/16 | Semantic-annotation tools: basics, Wikipedia structure, TAGME and other annotators. How to evaluate those systems.  | {{:magistraleinformatica:ir:ir16:lect_14-topic_annotators.pptx|Slides}}. |  | 29/11/16 | Semantic-annotation tools: basics, Wikipedia structure, TAGME and other annotators. How to evaluate those systems.  | {{:magistraleinformatica:ir:ir16:lect_14-topic_annotators.pptx|Slides}}. | 
 | 30/11/16 | Various approaches to text representation and their applications.  |  |  | 30/11/16 | Various approaches to text representation and their applications.  |  | 
-| 06/12/16 | Hands-on Lab.\\ You need to configure your laptop as follows: Linux system (may be a virtual machine) with debian-like OS (e.g. ''Ubuntu 16.10''), working Internet connection from the Polo's room, at least 5GB of free disk and 2GB RAM,  ''scrapy'', ''pylucene'' and ''lxml'' installed (that can be done with ''sudo apt-get update && sudo apt-get install scrapy  python-lucene python-lxml''). We have created script to check if your system has everything in place for the lab. You may run it ('python {{:magistraleinformatica:ir:ir16:lib_test.py.zip|lib_test.py}}') on your system: it must reach the end of the script w/o errors. \\ In collaboration with Marco Cornolti (cornolti@di.unipi.it).  | [[https://docs.google.com/presentation/d/1iXjtu_AduB-_CqsV2ye8M0q9_BHiCosKv9XcucZU-No/edit?usp=sharing | Slides (crawling)]] | +| 06/12/16 | Hands-on Lab.\\ You need to configure your laptop as follows: Linux system (may be a virtual machine) with debian-like OS (e.g. ''Ubuntu 16.10''), working Internet connection from the Polo's room, at least 5GB of free disk and 2GB RAM. You need to install the following python packages: ''scrapy'', ''pylucene'' (version 3.5, newer versions are not supported) and ''lxml''. They can be installed with: \\ ''sudo apt-get update'' \\ ''sudo apt-get install python-scrapy  python-lucene python-lxml'' \\ We have created {{:magistraleinformatica:ir:ir16:lib_test.py.zip|the lib_test.py script}} to check if your system has everything in place for the lab. You may run it (''python lib_test.py'') on your system: it must reach the end of the script w/o errors. \\ In collaboration with Marco Cornolti (cornolti@di.unipi.it).  | [[https://docs.google.com/presentation/d/1iXjtu_AduB-_CqsV2ye8M0q9_BHiCosKv9XcucZU-No/edit?usp=sharing | Slides (crawling)]] | 
 | 07/12/16 | Hands-on Lab.\\ Introduction to Lucene.\\ In collaboration with Marco Cornolti (cornolti@di.unipi.it).  | [[ https://docs.google.com/presentation/d/1JlZKfWW85Q5atTLPRieWWOmpEKZBWc1Zx6PL2ReywWY/edit?usp=sharing | Slides (Lucene)]] |  | 07/12/16 | Hands-on Lab.\\ Introduction to Lucene.\\ In collaboration with Marco Cornolti (cornolti@di.unipi.it).  | [[ https://docs.google.com/presentation/d/1JlZKfWW85Q5atTLPRieWWOmpEKZBWc1Zx6PL2ReywWY/edit?usp=sharing | Slides (Lucene)]] | 
 | 13/12/16 | Hands-on Lab.\\ Advanced topics and use of Lucene. \\ In collaboration with Marco Cornolti (cornolti@di.unipi.it).  | [[ https://docs.google.com/presentation/d/1PkyCieHbxLpFkz5uIPb_5g5j-oNHxHgz60XtWRJhrwc/edit?usp=sharing | Slides (Lucene's fields)]] |  | 13/12/16 | Hands-on Lab.\\ Advanced topics and use of Lucene. \\ In collaboration with Marco Cornolti (cornolti@di.unipi.it).  | [[ https://docs.google.com/presentation/d/1PkyCieHbxLpFkz5uIPb_5g5j-oNHxHgz60XtWRJhrwc/edit?usp=sharing | Slides (Lucene's fields)]] | 
-| 14/12/16 | Clustering: flat, hierarchical, soft, hard. K-means, optimal bisect, hierarchical - max, min, avg, centroid. Slides.\\ Chap 16 and 17 of [MRS].  +| 14/12/16 | Exercises   
  
  
magistraleinformatica/ir/ir16/start.1480410732.txt.gz · Ultima modifica: 29/11/2016 alle 09:12 (8 anni fa) da Paolo Ferragina