Strumenti Utente

Strumenti Sito


magistraleinformaticanetworking:ae:ae2016:start

Questa è una vecchia versione del documento!


Algorithm Engineering A.A. 2016-2017

Thanks to P. Sanders Teachers: Paolo Ferragina and Linda Pagli

CFU: 9 (second semester).

Course ID: 531AA.

Language: English.

Degree: Master degree in CS&Networking in collaboration with the Scuola Normale Superiore “S. Anna”.

Question time: After lectures and by appointment.

News: about this course will be distributed via a Tweeter-channel.

Official lectures schedule: The schedule and content of the lectures is available below and with the official registro.

Next student meetings (summer time)
6 June, hr 9:00, Ferragina's office
13 June, hr 9:00, Ferragina's office
22 June, hr 9:00, Ferragina's office
28 June, hr 17:30, Ferragina's office
Next meetings in July are via Skype/Hangout (since I'll be out of Italy), pls take an appointment via email.

Goals

In this course we will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. The design and analysis will involve several models of computation— such as RAM, 2-level memory, cache-oblivious, streaming— in order to take into account the architectural features and the memory hierarchy of modern PCs and the availability of Big Data upon which those algorithms could work on. We will add to such theoretical analysis several other engineering considerations spurring from the implementation of the proposed algorithms and from experiments published in the literature.

Every lecture will follow a problem-driven approach that starts from a real software-design problem, abstracts it in a combinatorial way (suitable for an algorithmic investigation), and then introduces algorithms aimed at minimizing the use of some computational resources like time, space, communication, I/O, energy, etc. Some of these solutions will be discussed also at an experimental level, in order to introduce proper engineering and tuning tools for algorithmic development.

Week Schedule of the Lectures

Week Schedule
Day Time Slot Room
Tuesday 9:00 - 11:00 A1
Wednesday 9:00 - 11:00 C1
Friday 11:00 - 13:00 B

Exam

Dates Room Text
12/06/2017 L1 (15:00) text
29/06/2017 A (9:00) text
27/07/2017 ( date changed !!! ) L1 (9:00) text
05/09/2017 ( date changed !!! ) N1 (15:00) text

Background

I strongly suggest to refresh your knowledge about basic Algorithms and Data Structures by looking at the well-known book Introduction to Algorithms, Cormen-Leiserson-Rivest-Stein (third edition). Specifically, I suggest you to look at the chapters 2, 3, 4, 6, 7, 8, 10, 11 (no perfect hash), 12 (no randomly built), 15 (no optimal BST), 18, 22 (no strongly connected components). Also, you could look at the Video Lectures by Erik Demaine and Charles Leiserson, specifically Lectures 1-7, 9-10 and 15-17.

Books, Notes, etc.

I'll use just the old-fashioned blackboard. Most of the content of the course will be covered by some notes I wrote in these years; for some topics I'll use parts of papers/books.

Lectures (preliminary schedule and links)

Date Lecture Biblio Slides
21/02/2017 Introduction to the course. Models of computation: RAM, 2-level memory. An example of algorithm analysis: the sum of n numbers. The role of the Virtual Memory system. Chap. 1 of the notes. Lecture0
22/02/2017 Maximum sub-array sum in 1d and its variations. Chap. 2 of the notes, about sect 2.5: first page and half. About the algorithmic solution of the density-based problem only hints of the basic ideas. Lecture1
24/02/2017 Random sampling: disk model, known length, definition of the streaming model. Chap. 3 of the notes. Lecture2
28/02/2017 Random sampling on the streaming model, known and unknown length. Reservoir sampling. List Ranking: definition of the problem, RAM solution.
01/03/2017 List Ranking: difficulties on disk, pointer-jumping technique, I/O-efficient simulation. Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations. Chap. 4 of the notes. CLRS cap. 2.3.1, 4.5 Lecture3-4
Students are warmly invited to refresh their know-how about: Divide-and-conquer technique for algorithm design and Master Theorem for solving recurrent relations; and Binary Search Trees Lecture 2, 9 and 10 of Demaine-Leiserson's course at MIT
03/03/2017 List Ranking: divide-and-conquer approach with its analysis. Selecting an independent set with randomization and deterministic coin tossing. Sorting atomic items: sorting vs permuting, comments on the time and I/O bounds, binary merge-sort and its bounds. lec.5
07/03/2017 Snow Plow and compression. Multi-way mergesort. Lower bounds for sorting. Chap. 5 of the notes lec.6
08/03/2017 The case of D>1 disks: non-optimality of multi-way MergeSort, the disk-striping technique. Quicksort: recap on best-case, worst-case. Average-case with analysis. Selection of k-th ranked item in linear average time (with proof). 3-way partition for better in-memory quicksort. Lower bound of Permuting is optional (sect 5.2.2). lec.7
10/03/2017 Quicksort: Bounded-space quicksort. Multi-way Quicksort for disk, multi-pivot selection, oversampling (with proof). lec.8
14/03/2017 Recap: graphs and their representations, BFS and DFS visits, computing shortest-path over unary-length edges. Chap 22.1-22.3 of CLR. Do not study Lemma 22.5. lec.9
15/03/2017 Recap: Topological sort. Minimum Spanning Tree problem: definition, Greedy approach. Kruskal and Prim algorithms and analysis. Chap 22.4 and 23 of CLR. lec.10
17/03/2017 Shortest Path problem: Dijkstra's algorithm. Algorithms for external and semi-external computation of MST. For Shortest Path look at this note. For external MST, look at Sect 11.5 of the Mehlhorn-Sander's book. lec.11
21/03/2017 Fast set intersection, various solutions: scan, sorted merge, binary search, mutual partition, binary search with exponential jumps. Chap. 8 of the notes.
22/03/2017 Fast set intersection: two-level scan, random shuffling. String sorting: comments on the difficulty of the problem on disk, lower bound. LSD-radix sort with proof of time complexity and correctness. Chap. 6 of the notes.
24/03/2017 MSD-radix sort and the trie data structure. Multi-key Quicksort. Ternary search tree. Interpolation search.
28/03/2017 Randomized data structures: Treaps and Skip lists (with proofs and comments on string items). Notes by others. Study also Theorems and Lemmas. see Demaine's lecture num. 12 on skip lists.
Students are warmly invited to refresh their know-how about: hash functions and their properties; hashing with chaining. Lectures 7 of Demaine-Leiserson's course at MIT
29/03/2017 Hashing and dictionary problem: direct addessing, simple hash functions, hashing with chaining, uniform hashing and its computing/storage cost, universal hashing (definition and properties). Two examples of Universal Hash functions: one with correctness proof, the other without. Chap. 7 of the notes. Theorem 7.3 without proof, Theo 7.5 without proof (only the statements).
31/03/2017 The lecture of today has been canceled because of a strike by personnel of the Polo Fibonacci (rooms closed)
04/04/2017 Perfect hash table (with proof). Minimal ordered perfect hashing: definition, properties, construction, space and time complexity.
05/04/2017 Cuckoo hashing (with proof). Bloom Filter: properties, construction, query operation (with proofs). Do not read subsections 7.7.1-7.7.3.
21/04/2017 Auto-completion search. RMQ and LCA queries, equivalence and reductions, their algorithmic solutions and few applications. Slides (no solution with linear space - 4 russian trick: page 10-29 starting from “The second data structure is built to efficiently answer RMQ-queries in which i and j are in the same block Dk.”)
To synchro the notes, here you can find the latest versions of the chapters of this first part of the course. Updated notes
28/04/2017 Prefix search: definition of the problem, solution based on arrays, Front-coding, two-level indexing. Compacted tries. Analysis of space, I/Os and time of the prefix search for all data structures seen in class. More on two-level indexing of strings: Solution based on Patricia trie, with analysis of space, I/Os and time of the prefix search. Chap. 9 of the notes: 9.1, 9.4 and 9.5.
02/05/2017 Substring search: definition, properties, reduction to prefix search. The Suffix Array. Binary searching the Suffix Array: p log n. Searching in Suffix Arrays with p + log n. Suffix Array construction via qsort and its asymptotic analysis. Chap. 10 of the notes: 10.1, 10.2.1 and 10.2.2, 10.2.3 (no “The skew algorithm”, “The Scan-based algorithm”).
03/05/2017 LCP array construction in linear time. Suffix Trees: properties, structure, pattern search, space occupancy. Construction of Suffix Trees from Suffix Arrays and LCP arrays, and vice versa. Sect 10.3, 10.3.1, 10.3.2
05/05/2017 Text mining use of suffix arrays. The k-mismatch problem with SA+LCP or ST and RMQ data structure. Sect 10.4.1 and 10.4.3
09/05/2017 Esercitazione
10/05/2017 Esercitazione
11/05/2017 Esercitazione
12/05/2017 Prefix-free codes, notion of entropy, optimal codes. Integer coding: the problem and some considerations. The codes Gamma and Delta, space/time performance and consideration on optimal distributions. Rice, PForDelta. Chap. 11 of the notes
16/05/2017 Coders: (s,c)-codes, variable-byte, Interpolative. With examples.
17/05/2017 Example on Interpolative and Elias-Fano.
18/05/2017 Huffman, with optimality (proof). Canonical Huffman: construction, properties, decompression algorithm. Chap. 12 of the notes (no sect 12.1.2).
19/05/2017 Arithmetic coding: properties, algorithm and proofs; with examples. No PPM and Range coding.
22/05/2017 Dictionary-based compressors: properties and algorithmic structure. LZ77, LZSS and gzip. LZ8 and LZW. LZ parsing with suffix trees and LCA queries. Chap 13, no par 13.4, and Chap 10 at sect 10.4.2
23/05/2017 Burrows-Wheeler Transform (forward and backward), Move-To-Front, Run-Length-Encoding, bzip2. Chap. 14 of the notes. Do not study Theorem 14.5 and sections 14.4 and 14.5.
24/05/2017 Exercises on data compression.
29/05/2017 Recap on the arguments of the course.
magistraleinformaticanetworking/ae/ae2016/start.1500190084.txt.gz · Ultima modifica: 16/07/2017 alle 07:28 (7 anni fa) da Paolo Ferragina