Strumenti Utente

Strumenti Sito


magistraleinformatica:alg2:algo2_15:start

Year 2015-2016

Prof. Roberto Grossi
Tutor: Dr. Filippo Geraci

Announcements

  • Results and scheduling of the oral examinations in the following days: Tue, Jan 12, 2016, at 9:00 in my office.
  • New exercises (Dec. 15) added to the partial list of problems.
  • Next terms: Jan. 20 at 9:00 in room L1; Feb. 10 at 9:00 in room L1.
  • Office hours: Tue 14-16 (Dipartimento di Informatica)

Overview

We will study, design and analyze advanced algorithms and data structures for the efficient solution of combinatorial problems involving all basic data types, such as integers, strings, (geometric) points, trees and graphs. This course deepens and extends the algorithmic notions of students.

The syllabus is structured to highlight the applicative scenarios in which the studied algorithms and data structures can be successfully applied. The level of detail with which each argument will be dealt with can change year-by-year, and will be decided according to requests coming from other courses and/or specific issues arising in, possibly novel, applicative scenarios.

The advanced nature of this course focuses on developing algorithmic design skills. The students are exposed to complex problems that require a significant effort in problem solving. One “brainstorming” lecture per week is devoted to a full immersion in this activity, and it is highly recommend to attend it. The purpose is not that of teaching/learning further N algorithms, but to refine students' skills. The final written exam will be based on the topics discussed during the “brainstorming” lectures.

We like students that make mistakes in a constructive way, since this is the starting point of our brainstorming to drive their reasoning paths, thus learning by mistakes. We are interested in the route that leads to the algorithmic solution, rather than the solution itself. Regarding mistakes and intelligence, Alan Turing wrote in 1947 that “… if a machine is expected to be infallible, it cannot also be intelligent”. This is a motto for this class, with a kind invitation to express your own ideas even if they could be apparently wrong.

Topics

Please note that several topics are the outcome of recent advancement in algorithms and data structures, and thus most of the course material consists in research papers or book chapters.

Date Topics References and notes
Sept. 22 Introduction to the course. Entrance exam. -
Sept. 24 Discussion of the solutions for the exercises of the entrance exam. see snapshots
Sept. 28 Problem solving. Statistics on arrays (top-k elements). see snapshots
Sept. 29 Linear-time worst-case selection. par.9.3

Randomization, hashing and data streaming

Randomization is now a common powerful tool to solve large-scale problems. After introducing the concept of randomized algorithms and hashing, we apply them to solve data streaming problems, a field emerged in the last decade. Here data flow as a stream and one-pass algorithms with limited memory can process it. We focus on the count-min sketch paradigm and its applications. [Note: to refresh the basic notions on counting and probability, please refer to Appendix C in Cormen-Leiserson-Rivest-Stein's book “Introduction to Algorithms”, 3rd ed., MIT Press.]

Date Topics References and notes
Oct. 1 Las Vegas and Montecarlo algorithms. Indicator variables and randomised quicksort (Las Vegas). par. 7.3
Oct. 5 Problem solving. Randomized selection (Las Vegas). list of problems
Oct. 6 Random indicator variables: secretary problem (and random permuting), birthday paradox. par. 5.1-5.3, 5.4.1
Oct. 8 Karp-Rabin fingerprints: randomized checking and pattern matching (Montecarlo and Las Vegas) par.7.4-7.6
Oct. 12 Randomized min-cut in unweighted graphs (Montecarlo) par.1.1
Oct. 13 Randomness and Kolmogorov complexity Notes
Oct. 15 Lecture by prof. Ferragina: issue of hierarchical memories: I/O-model. Index construction: multi-way mergesort, BSBI and SPIMI. Sketch on MapReduce. Distributed indexing: Term-based vs Doc-based partitioning. Slides Chapt.4
Oct. 19 Problem solving. Randomized min-cut in graphs. list of problems
Oct. 20 General assembly
Oct. 22 Data Streaming Model. Motivations and examples. Count-Min Sketches sects.1-3 Site Notes
Oct. 26 Problem solving. Data streaming. list of problems
Oct. 27 Chernoff bounds, and applications of count-min sketches. par.4.1,Th.4.1 (proof is optional) sect.4 Notes
Oct. 29 Using a family of random hash functions: Cuckoo hashing. Notes
Nov. 9 Problem solving. Hash functions and data streaming. list of problems

Algorithms for massive data sets and hierarchical memories

Massive data sets are one of the key elements in our so-called big data society. In this scenario the design and analysis of algorithms and data structures adopt the external and cache-oblivious models, which take into account the memory hierarchy that provides performance to modern computers. We study basic problems on sorting and searching in these models, comparing the results with the classical RAM model of computation.

Date Topics References and notes
Nov. 10 Review of the external memory model (EMM): sequential access versus random access, I/O complexity. Static and dynamic searching in external memory. (a,b)-trees and B-trees: definition, search and insert operations. Introduction, chapt.2 and 3 (set D=1 disks)sect.1 and 2
Nov. 12 Dynamic searching in external memory: (a,b)-trees and B-trees. Deletion, construction, and I/O analysis. Lower bound for the I/O complexity of sorting. sect.2 Notes
Nov. 16 Problem solving. Hash functions and data streaming. list of problems
Nov. 17 Cache-oblivious (CO) model. Difference from the EM model and examples. Sect.1-3
Nov. 17 Problem solving. Algorithms for external memory (continued). list of problems
Nov. 19 Cache-oblivious searching: van Emde Boas layout and CO B-trees. [Sect.4.1]
Nov. 23 Problem solving. CO algorithms. list of problems
Nov. 24 Cache-oblivious layout of binary trees. Notes Sect.5 (see 5.4)
Nov. 26 Problem solving. CO algorithms (continued). list of problems
Nov. 30 Map-Reduce paradigm. Suffix array construction (DC3). paper example DC3 algorithm (up to Section3)

Enumeration, hardness and approximation of some combinatorial problems

NP-hard problems are important but difficult to solve and no deterministic polynomial algorithms are currently known for them. Even basic problems, such as finding a simple path of vertices in a graph, are NP-hard. We discuss how to attack these problems (a) by counting and listing their solutions where the cost is proportional to the output (it can be exponential in the worst case but we only pay for what we get) and (b) by approximating their solutions with polynomial-time algorithms when the problem requires to minimize a cost or maximize a benefit. Most of the addressed problems are on graphs, which are popular representations for modern networked data.

Date Topics References and notes
Dec. 1 Quick review of NP-completeness and NP-hardness. Euler tour vs Hamiltonian tour vs TravelSalesman Problem (TSP). [CLRS, chap.34 (up to 34.4 plus 34.5.4, plus 35.2.2]
Dec. 1 Problem solving. MapReduce. Euler tour. list of problems
Dec. 3 2-approximation for metric TSP and for Min Vertex Cover. [CLRS, 35.1, 35.2]
Dec. 7 Approximations for the bin packing problem. chapt.2: par. 2.2.2
Dec. 10 Problem solving. Hard problems. list of problems
Dec. 14 Approximation for the MAX-CUT and knapsack problems. Notes chapt.2: par. 2.1.1
Dec. 15 Problem solving. Approximation algorithms. list of problems
Activity in class
Official documents for the course
Spot yourself in the classroom

Examination outcomes (in Italian)

Examination of Dec. 16, 2015

matricolascoree1e2e3notes
300759----esercitazione
4415172910910e2: non è molto chiara la costruzione.
45205820488e1: only the first point done; e2: search and analysis missing; e3: missing generalization
46598221777e1:non chiarisce le strutture e i dati da utilizzare;imprecisione nel secondo punto; e2:la struttura non è implicita e mancano dettagli; e3: mancano i dettagli
46882718909e1:discute solo il caso di pesi interi; 2: non svolto; e3: manca la descrizione di come si costruisce il grafo
47952622499e1: description and analysis unclear; e2: the tree is not implicit; e3: missing how edges are set up
4836332991010e1: poco chiaro come pesca gli archi e la gestione di p[]; e2: buona l'idea ma l'analisi non è descritta bene
4848372910910e1: non chiaro cosa succede nelle liste di adiacenza quando due nodi sono uniti; e2: un po' tirato via.
4900682991010e1:scelta dell'arco non uniforme (il nodo va sceltoin base al suo peso)
494087251069e1:piccola svista;e3:le condizioni non caratterizzano completamente il grafo
49457725898e1:la scelta dell'arco non è uniforme e l'analisi ha un passaggio poco chiaro; e2:manca la regola come scendere da padre in figlio e piccola svista nell'analisi; e3: descrizione poco chiara
498122279810e1: non dice come aggiorna le altre liste; e2: l'albero ottenuto non è implicito;
52734924699e1:manca il primo punto; e3: descrizione incomlpeta del grafo; e2:manca la regola come scendere da padre in figlio
52802522499e1: non sono chiare tutte le regole utilizzate; mancano gli altri due punti
5334082910910e2:non specifica la regola per mavigare;
53377225969e1:edges are not chosen uniformly at random; e2:missing the rules to navigate;
534789199010e1: how the other lists are updated? Missing probabillity of success for wieghed graphs; e2: not done;
537580122010e1: only one point done; e2: not done;
5383152810810e2: manca regola per navigare e alcuni dettagli/analisi costruzione
53927625969e1: analysis is not clear in the seocnd point; e2: missing details on the costruction and rules to navigate;

Examination of Jan. 20, 2016

  • 463883 27
  • 468827 26
  • 541769 18

Examination of Feb. 10, 2016

  • 300759 21
  • 459410 30
  • 489617 29
  • 493413 29
  • 540574 11
  • 541451 12
  • 541769 27
  • 541784 24
magistraleinformatica/alg2/algo2_15/start.txt · Ultima modifica: 15/02/2016 alle 07:37 (18 mesi fa) da Roberto Grossi