Questa è una vecchia versione del documento!
Indice
Statistics for Data Science (628PP) A.Y. 2024/25
Instructors
- Francesco Giannini
- Università di Pisa
- Office hours: TBD or by appointment, at the Department of Computer Science, room 385/DO, or via Teams.
- Salvatore Ruggieri
- Università di Pisa
- Office hours: Tuesdays h 16:00 - 18:00 or by appointment, at the Department of Computer Science, room 321/DO, or via Teams.
Hours and rooms
| Day of Week | Hour | Room |
|---|---|---|
| Tuesday | 14:00 - 16:00 | Fib-C |
| Wednesday | 9:00 - 11:00 | Fib-C1 |
| Thursday | 9:00 - 11:00 | Fib-C1 |
Pre-requisites
Students should be comfortable with most of the topics on mathematical calculus covered in:
- [P] J. Ward, J. Abdey. Mathematics and Statistics. University of London, 2013. Chapters 1-8 of Part 1.
Extra-lessons refreshing such notions may be planned in the first part of the course.
Mandatory Teaching Material
The following are mandatory text books:
- [T] F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. A Modern Introduction to Probability and Statistics. Springer, 2005.
- [R] P. Dalgaard. Introductory Statistics with R. 2nd edition, Springer, 2008.
- selected chapters of other books for advanced topics
Software
Preliminary program and calendar
Exams
There are no mid-terms. The exam consists of a written part and an oral part. The written part consists of exercises and questions on the topics of the course. Each question is assigned a grade, summing up to 30 points. Example written texts: sample1, sample2. Students are admitted to the oral part if they receive a grade of at least 18 points. The oral part consists of critical discussion of the written part and of open questions and problem solving on the topics (both theory and R programming) of the course. In particular, students must demonstrate to be able to summarize both the theory and the software related to any of the lessons using the slides and R scripts of the lessons.
Registration to exams is mandatory (beware of the registration deadline!): register here. The dates below are only for the written test (normal exam).
Dates for project discussion are included in the project description.
| Date | Hour | Room | Notes |
|---|
Student project
- The project replaces the written part of the examination
- Project description and rules and Q&A will be published here in April.
Class calendar
A Teams channel is used to post news, notes, Q&A, and other stuff related to the course.
The lectures will be only in presence and will NOT be live-streamed. Recordings from previous years are available for non‑attending students (see the past years section below); however, these materials may not fully correspond to the content taught in the current academic year.
Material of future lessons refer to the last academic year. Slides and R scripts might be updated after the classes to align with actual content of current year and to correct typos. Be sure to download the updated versions.
| # | Date | Room | Topic | Mandatory teaching material |
|---|---|---|---|---|
| 01 | 17/02 11-13 | Fib-E | Introduction. Probability and independence. rec01 (.mp4) | [T] Chpts. 1-3 slides01 (.pdf) |
| 02 | 17/02 14-16 | Fib-C | R basics. rec02 (.mp4) | [R] Chpts. 1,2.1-2.3 slides02 (.pdf), script02 (.R) |
| 03 | 20/02 11-13 | Fib-A1 | Bayes' rule and applications. rec03 (.mp4) | [T] Chpt. 3 slides03 (.pdf), script03 (.R) |
| 04 | 24/02 14-16 | Fib-C | Discrete random variables. rec04 (.mp4) | [T] Chpts. 4, 9.1, 9.2, 9.4 [R] Chpt. 3 slides04 (.pdf), script04 (.R) |
| 05 | 25/02 14-16 | Fib-A1 | Discrete random variables (continued). rec05 (.mp4) | |
| 06 | 27/02 11-13 | Fib-A1 | Recalls: derivatives and integrals. rec06 (.mp4) | [P] Chpt. 1-8 slides06 (.pdf), script06 (.R) |
| 07 | 03/03 14-16 | Fib-C | R data access and programming. rec07 (.mp4) | [R] Chpt. 2.3,2.4 script07 (.zip) |
| 08 | 04/03 14-16 | Fib-A1 | Continuous random variables.rec08 (.mp4) | [T] Chpts. 5, 9.2-9.4 [R] Chpt. 3 slides08 (.pdf), script08 (.R) |
| 09 | 06/03 11-13 | Fib-A1 | Expectation and variance. Computations with random variables.rec09 (.mp4) | [T] Chpts. 7,8 slides09 (.pdf), script09 (.R) |
| 10 | 10/03 14-16 | Fib-C | Expectation and variance. Computations with random variables (continued). Moments. Functions of random variables. rec10 (.mp4) | [T] Chpts. 9-11 slides10 (.pdf), script10 (.zip) |
| 11 | 11/03 14-16 | Fib-A1 | Functions of random variables (continued). Distances between distributions. rec11 (.mp4) | Murphy's book Chpt. 6 slides11 (.pdf), script11 (.R) |
| 12 | 13/03 11-13 | Fib-A1 | Simulation. rec12 (.mp4) | [T] Chpts. 6.1-6.2 slides12 (.pdf), script12 (.R) script12_sol07 (.R) |
| 13 | 17/03 14-16 | Fib-C | Power laws and Zipf's law. rec13 (.mp4) | Newman's paper Sect I, II, III(A,B,E,F) slides13 (.pdf), script13 (.R) |
| 14 | 18/03 14-16 | Fib-A1 | Law of large numbers. The central limit theorem. rec14 (.mp4) | [T] Chpts. 13-14 slides14 (.pdf), script14 (.R) |
| 15 | 20/03 11-13 | Fib-A1 | Graphical summaries. Kernel Density Estimation. rec15 (.mp4) | [T] Chpt. 15, [R] Chpt. 4 slides15 (.pdf), script15 (.R) |
| 16 | 24/03 14-16 | Fib-C | Numerical summaries.rec16 (.mp4) | [T] Chpt. 16, [R] Chpt. 4 slides16 (.pdf), script16 (.R) |
| 17 | 25/03 14-16 | Fib-A1 | Data preprocessing in R. Estimators.rec17 (.mp4) | [R] Chpt. 10, [T] Chpts. 17.1-17.3script17 (.R), dataprep.R |
| 18 | 27/03 11-13 | Fib-A1 | Unbiased estimators. Efficiency and MSE.rec18 (.mp4) | [T] Chpts. 19, 20 slides18 (.pdf), script18 (.R) |
| 19 | 31/03 14-16 | Fib-C | Maximum likelihood estimation.rec19 (.mp4) | [T] Chpt. 21 s4dsln.pdf Chpt. 1 slides19 (.pdf), script19 (.R) |
| 20 | 01/04 14-16 | Fib-A1 | Linear regression. Least squares estimation.rec20 (.mp4) | [T] Chpts. 17.4,22 [R] Chpt. 6 s4dsln.pdf Chpt. 2 slides20 (.pdf), script20 (.R) |
| 21 | 03/04 11-13 | Fib-A1 | Non-linear, and multiple linear regression.rec21 (.mp4) | [R] Chpt. 12.1,13,16.1-16.2 s4dsln.pdf Chpt. 2 slides21 (.pdf), script21 (.R) |
| 22 | 07/04 14-16 | Fib-C | Issues with linear regression. Logistic regression.rec22 (.mp4) | [R] Chpt. 12.1,13,16.1-16.2 slides22 (.pdf), script22 (.zip) |
| 23 | 08/04 14-16 | Fib-A1 | Statistical decision theory.rec23 (.mp4) | s4dsln.pdf Chpt. 4 slides23 (.pdf), script23 (.R) |
| 24 | 10/04 11-13 | Fib-A1 | Statistical decision theory (continued).rec24 (.mp4) | |
| 25 | 14/04 14-16 | Fib-C | Statistical decision theory (continued). Project presentation. | |
| 26 | 15/04 14-16 | Fib-A1 | Confidence intervals: mean, proportion, linear regression.rec26 (.mp4) | [T] Chpts. 23.1,23.2,23.4,24.3,24.4 s4dsln.pdf Chpt. 3 slides26 (.pdf), script26 (.R) |
| 27 | 17/04 11-13 | Fib-A1 | Confidence intervals (continued). Bootstrap and resampling methods.rec27 (.mp4) | [T] Chpts. 18.1-18.3,23.3 slides27 (.pdf), script27 (.R) |
| 28 | 24/04 11-13 | Fib-A1 | Bootstrap and resampling methods (continued).rec28 (.mp4) | |
| 29 | 28/04 14-16 | Fib-C | Hypotheses testing. One-sample tests of the mean and application to linear regression.rec29 (.mp4) | [T] Chpts. 25,26,27, [R] Chpts. 5.1,5.2 s4dsln.pdf Chpt.3.3 slides29 (.pdf), script29 (.R) |
| 30 | 29/04 14-16 | Fib-A1 | One-sample tests of the mean and application to linear regression (continued). Classifier performance metrics in R. rec30 (.mp4) | slides30 (.pdf), script30 (.R) |
| 31 | 05/05 14-16 | Fib-C | Two-sample tests of the mean and applications to classifier comparison. rec31 (.mp4) | [T] Chpt. 28, [R] Chpts. 5.3-5.7 slides31 (.pdf), script31 (.R) |
| 32 | 06/05 14-16 | Fib-A1 | Multiple-sample tests of the mean and applications to classifier comparison.rec32 (.mp4) | [R] Chpt. 7 slides32 (.pdf), script32 (.R) |
| 33 | 08/05 11-13 | Fib-A1 | Fitting distributions. Testing independence/association.rec33 (.mp4) | [R] Chpt. 8 K-S, slides33 (.pdf), script33 (.R) |
| s03 | 12/05 14-16 | Fib-C | Mandatory seminar: Introduction to causal modeling and reasoning. Speakers: I. Beretta and M. Cinquini. rec_s03 (.mp4) | slides_s03 (.pdf) |
| 34 | 13/05 14-16 | Fib-A1 | Fitting distributions. Testing independence/association (continued). Project Q&A. | |
| 35 | 15/05 11-13 | Fib-A1 | Project Q&A. |
