Year 2015-2016

Prof. Roberto Grossi
Tutor: Dr. Filippo Geraci

Announcements

Results and scheduling of the oral examinations in the following days: Tue, Jan 12, 2016, at 9:00 in my office.
New exercises (Dec. 15) added to the partial list of problems.
Next terms: Jan. 20 at 9:00 in room L1; Feb. 10 at 9:00 in room L1.
Office hours: Tue 14-16 (Dipartimento di Informatica)

Overview

We will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. This course deepens and extends the algorithmic notions of students.

The syllabus is structured to highlight the applicative scenarios in which the studied algorithms and data structures can be successfully applied. The level of detail with which each argument will be dealt with can change year-by-year, and will be decided according to requests coming from other courses and/or specific issues arising in, possibly novel, applicative scenarios.

The advanced nature of this course focuses on developing algorithmic design skills. The students are exposed to complex problems that require a significant effort in problem solving. One “brainstorming” lecture per week is devoted to a full immersion in this activity, and it is highly recommend to attend it. The purpose is not that of teaching/learning further N algorithms, but to refine students' skills. The final written exam will be based on the topics discussed during the “brainstorming” lectures.

We like students that make mistakes in a constructive way, since this is the starting point of our brainstorming to drive their reasoning paths, thus learning by mistakes. We are interested in the route that leads to the algorithmic solution, rather than the solution itself. Regarding mistakes and intelligence, Alan Turing wrote in 1947 that “… if a machine is expected to be infallible, it cannot also be intelligent”. This is a motto for this class, with a kind invitation to express your own ideas even if they could be apparently wrong.

Topics

Please note that several topics are the outcome of recent advancement in algorithms and data structures, and thus most of the course material consists in research papers or book chapters.

Date	Topics	References and notes
Sept. 22	Introduction to the course. Entrance exam.	-
Sept. 24	Discussion of the solutions for the exercises of the entrance exam.	see snapshots
Sept. 28	Problem solving. Statistics on arrays (top-k elements).	see snapshots
Sept. 29	Linear-time worst-case selection.	par.9.3

Randomization, hashing and data streaming

Randomization is now a common powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we apply them to solve data streaming problems, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications. [Note: to refresh the basic notions on counting and probability, please refer to Appendix C in Cormen-Leiserson-Rivest-Stein's book “Introduction to Algorithms”, 3rd ed., MIT Press.]

Date	Topics	References and notes
Oct. 1	Las Vegas and Montecarlo algorithms. Indicator variables and randomised quicksort (Las Vegas).	par. 7.3
Oct. 5	Problem solving. Randomized selection (Las Vegas).	list of problems
Oct. 6	Random indicator variables: secretary problem (and random permuting), birthday paradox.	par. 5.1-5.3, 5.4.1
Oct. 8	Karp-Rabin fingerprints: randomized checking and pattern matching (Montecarlo and Las Vegas)	par.7.4-7.6
Oct. 12	Randomized min-cut in unweighted graphs (Montecarlo)	par.1.1
Oct. 13	Randomness and Kolmogorov complexity	Notes
Oct. 15	Lecture by prof. Ferragina: issue of hierarchical memories: I/O-model. Index construction: multi-way mergesort, BSBI and SPIMI. Sketch on MapReduce. Distributed indexing: Term-based vs Doc-based partitioning.	Slides Chapt.4
Oct. 19	Problem solving. Randomized min-cut in graphs.	list of problems
Oct. 20	General assembly
Oct. 22	Data Streaming Model. Motivations and examples. Count-Min Sketches	sects.1-3 Site Notes
Oct. 26	Problem solving. Data streaming.	list of problems
Oct. 27	Chernoff bounds, and applications of count-min sketches.	par.4.1,Th.4.1 (proof is optional) sect.4 Notes
Oct. 29	Using a family of random hash functions: Cuckoo hashing.	Notes
Nov. 9	Problem solving. Hash functions and data streaming.	list of problems

Algorithms for massive data sets and hierarchical memories

Massive data sets are one of the key elements in our so-called big data society. In this scenario the design and analysis of algorithms and data structures adopt the external and cache-oblivious models, which take into account the memory hierarchy that provides performance to modern computers. We study basic problems on sorting and searching in these models, comparing the results with the classical RAM model of computation.

Date	Topics	References and notes
Nov. 10	Review of the external memory model (EMM): sequential access versus random access, I/O complexity. Static and dynamic searching in external memory. (a,b)-trees and B-trees: definition, search and insert operations.	Introduction, chapt.2 and 3 (set D=1 disks)sect.1 and 2
Nov. 12	Dynamic searching in external memory: (a,b)-trees and B-trees. Deletion, construction, and I/O analysis. Lower bound for the I/O complexity of sorting.	sect.2 Notes
Nov. 16	Problem solving. Hash functions and data streaming.	list of problems
Nov. 17	Cache-oblivious (CO) model. Difference from the EM model and examples.	Sect.1-3
Nov. 17	Problem solving. Algorithms for external memory (continued).	list of problems
Nov. 19	Cache-oblivious searching: van Emde Boas layout and CO B-trees.	[Sect.4.1]
Nov. 23	Problem solving. CO algorithms.	list of problems
Nov. 24	Cache-oblivious layout of binary trees.	Notes Sect.5 (see 5.4)
Nov. 26	Problem solving. CO algorithms (continued).	list of problems
Nov. 30	Map-Reduce paradigm. Suffix array construction (DC3).	paper example DC3 algorithm (up to Section3)

Enumeration, hardness and approximation of some combinatorial problems

NP-hard problems are important but difficult to solve and no deterministic polynomial algorithms are currently known for them. Even basic problems, such as finding a simple path of vertices in a graph, are NP-hard. We discuss how to attack these problems (a) by counting and listing their solutions where the cost is proportional to the output (it can be exponential in the worst case but we only pay for what we get) and (b) by approximating their solutions with polynomial-time algorithms when the problem requires to minimize a cost or maximize a benefit. Most of the addressed problems are on graphs, which are popular representations for modern networked data.

Date	Topics	References and notes
Dec. 1	Quick review of NP-completeness and NP-hardness. Euler tour vs Hamiltonian tour vs TravelSalesman Problem (TSP).	[CLRS, chap.34 (up to 34.4 plus 34.5.4, plus 35.2.2]
Dec. 1	Problem solving. MapReduce. Euler tour.	list of problems
Dec. 3	2-approximation for metric TSP and for Min Vertex Cover.	[CLRS, 35.1, 35.2]
Dec. 7	Approximations for the bin packing problem.	chapt.2: par. 2.2.2
Dec. 10	Problem solving. Hard problems.	list of problems
Dec. 14	Approximation for the MAX-CUT and knapsack problems.	Notes chapt.2: par. 2.1.1
Dec. 15	Problem solving. Approximation algorithms.	list of problems

Activity in class

This is the final list of problems.
Some of the screen snapshots shown during the classes.

Official documents for the course

Access to unimap log (registro delle lezioni).
Access to the course evaluation form (questionario studenti).

Spot yourself in the classroom

Examination outcomes (in Italian)

Examination of Dec. 16, 2015

matricola	score	e1	e2	e3	notes
300759	-	-	-	-	esercitazione
441517	29	10	9	10	e2: non è molto chiara la costruzione.
452058	20	4	8	8	e1: only the first point done; e2: search and analysis missing; e3: missing generalization
465982	21	7	7	7	e1:non chiarisce le strutture e i dati da utilizzare;imprecisione nel secondo punto; e2:la struttura non è implicita e mancano dettagli; e3: mancano i dettagli
468827	18	9	0	9	e1:discute solo il caso di pesi interi; 2: non svolto; e3: manca la descrizione di come si costruisce il grafo
479526	22	4	9	9	e1: description and analysis unclear; e2: the tree is not implicit; e3: missing how edges are set up
483633	29	9	10	10	e1: poco chiaro come pesca gli archi e la gestione di p[]; e2: buona l'idea ma l'analisi non è descritta bene
484837	29	10	9	10	e1: non chiaro cosa succede nelle liste di adiacenza quando due nodi sono uniti; e2: un po' tirato via.
490068	29	9	10	10	e1:scelta dell'arco non uniforme (il nodo va sceltoin base al suo peso)
494087	25	10	6	9	e1:piccola svista;e3:le condizioni non caratterizzano completamente il grafo
494577	25	8	9	8	e1:la scelta dell'arco non è uniforme e l'analisi ha un passaggio poco chiaro; e2:manca la regola come scendere da padre in figlio e piccola svista nell'analisi; e3: descrizione poco chiara
498122	27	9	8	10	e1: non dice come aggiorna le altre liste; e2: l'albero ottenuto non è implicito;
527349	24	6	9	9	e1:manca il primo punto; e3: descrizione incomlpeta del grafo; e2:manca la regola come scendere da padre in figlio
528025	22	4	9	9	e1: non sono chiare tutte le regole utilizzate; mancano gli altri due punti
533408	29	10	9	10	e2:non specifica la regola per mavigare;
533772	25	9	6	9	e1:edges are not chosen uniformly at random; e2:missing the rules to navigate;
534789	19	9	0	10	e1: how the other lists are updated? Missing probabillity of success for wieghed graphs; e2: not done;
537580	12	2	0	10	e1: only one point done; e2: not done;
538315	28	10	8	10	e2: manca regola per navigare e alcuni dettagli/analisi costruzione
539276	25	9	6	9	e1: analysis is not clear in the seocnd point; e2: missing details on the costruction and rules to navigate;

Examination of Jan. 20, 2016

463883 27
468827 26
541769 18

Examination of Feb. 10, 2016

300759 21
459410 30
489617 29
493413 29
540574 11
541451 12
541769 27
541784 24