Questa è una vecchia versione del documento!
<html>
<!– Google Analytics –>
<script type=“text/javascript” charset=“utf-8”>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true});
ga('personalTracker.require', 'linker');
ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] );
ga('personalTracker.require', 'displayfeatures');
ga('personalTracker.send', 'pageview', 'ruggieri/teaching/smd/');
setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000);
</script>
<!– End Google Analytics –>
<!– Global site tag (gtag.js) - Google Analytics –>
<script async src=“https://www.googletagmanager.com/gtag/js?id=G-LPWY0VLB5W”></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-LPWY0VLB5W');
</script>
<!– Capture clicks –>
<script>
jQuery(document).ready(function(){
jQuery('a[href$=“.pdf”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'SMD', 'PDFs', fname);
});
jQuery('a[href$=“.r”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'SMD', 'Rs', fname);
});
jQuery('a[href$=“.zip”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'SMD', 'ZIPs', fname);
});
});
</script>
</html>
====== Statistical Methods for Data Science A.Y. 2019/20 ======
=====Instructor=====
* Salvatore Ruggieri
* Università di Pisa
* http://pages.di.unipi.it/ruggieri/
* salvatore [dot] ruggieri [at] unipi [dot] it
* Office hours
* Tuesday h 14:00 - 17:00, Department of Computer Science, room 321/DO.
* Office hours only via skype. Skype contact: salvatore.ruggieri
=====Classes=====
^ Day of Week ^ Hour ^ Room ^
| Tuesday | 16:00 - 18:00 | Fib-L1 Distance Learning |
| Wednesday| 9:00 - 11:00 | Fib-A1 Distance Learning |
=====Pre-requisites=====
Students should be comfortable with most of the topics on mathematical calculus covered in:
* [P] J. Ward, J. Abdey. Mathematics and Statistics. University of London, 2013. Chapters 1-8 of Part 1.
Extra-lessons refreshing such notions may be planned in the first part of the course.
=====Mandatory Teaching Material=====
The following are mandatory text books:
* [T] F.M. Dekking C. Kraaikamp, H.P. Lopuha, L.E. Meester. A Modern Introduction to Probability and Statistics. Springer, 2005.
* [R] P. Dalgaard. Introductory Statistics with R. 2nd edition, Springer, 2008.
=====Software=====
* R
* R Studio
=====Preliminary program and calendar=====
* Preliminary program.
* Calendar of lessons.
=====Student project=====
* The project can be done in groups of at most 3 students.
* The project must be delivered (report + code) by end of July.
* The oral discussion must be done by the September session, and it will cover both the project and all topics of the course.
* The project replaces the written exam but students have to register for the written dates in order to fill the student's questionnaire.
* Groups ready to discuss send the project to the teacher plus availability time slots for oral discussion.
* Project presentation slides and project info audio-video (.flv) and project data audio-video (.flv).
* Google Drive project directory (accessible only to authorized students)
=====Written exam=====
There are no mid-terms. The exam consists of a written part and an oral part. The written part consists of exercises on the topics of the course. Each question is assigned a grade, summing up to 30 points. Students are admitted to the oral part if they receive a grade of at least 18 points. Written exam consists of open questions and exercises. Example written texts: sample1, sample2. Oral consists of critical discussion of the written part and of open questions and problem solving on the topics of the course.
Online exams: during the COVID-19 restrictions, the written part and the oral part will be online. For the written part, students will connect to Google Meet (room code: 500PP) and will activate both microphone and web-cam. Each sheet will include name, surname, student id, and it will be signed. A picture of the sheets will be delivered to ruggieri [at] di [dot] unipi [dot] it.
Registration to exams is mandatory (look at the deadline for registering!): register here
^ Date ^ Hour ^ Room ^ Notes ^
| 19/01/2021 | 16:00 - 18:00 | Online exam | |
| 09/02/2021 | 16:00 - 18:00 | Online exam | |
=====Class calendar=====
Distance-learning lessons: see instructions for Google Meet and use the room code: 500PP.
^ ^ Date ^ Room ^ Topic ^ Learning material ^
|1| 25.02 16:00-18:00 | L1 | Introduction. Probability and independence. | [T] Chpts. 1-3 |
|2| 26.02 9:00-11:00 | A1 | R basics. | [R] Chpts. 1,2.1,2.2 slides script1.R |
|3| 03.03 16:00-18:00 | L1 | Discrete random variables. | [T] Chpt. 4 [R] Chpt. 3 script2.R |
|4| 04.03 9:00-11:00 | A1 | Continuous random variables. Simulation. | [T] Chpts. 5, 6.1-6.2 [R] Chpt. 3 script3.R |
|5| 10.03 16:00-18:00 | Distance-learning | Recalls: derivatives and integrals. rec01 audio-video (.flv) | [P] Chpt. 1-8 scriptMath.R|
|6| 11.03 9:00-11:00 | Distance-learning| Expectation and variance. R data access. rec02 audio-video (.flv) | [T] Chpt. 7 [R] Chpt. 2.4 script4.R |
|7| 17.03 16:00-18:00 | Distance-learning | R programming. Project presentation. rec03 audio-video (.flv) and project info audio-video (.flv) | [R] Chpt. 2.3 exercise.R script5.zip |
|8| 18.03 9:00-11:00 | Distance-learning | Project presentation. Power laws and Zipf laws. rec04 audio-video (.flv) | Newman's paper Sect I, II, III(A,B,E,F) script6.R |
|9| 24.03 16:00-18:00 | Distance-learning | Computations with random variables. Joint distributions. rec05 audio-video (.flv) | [T] Chpts. 8-9 script7.zip |
|10| 25.03 9:00-11:00 | Distance-learning | Covariance. Sum of random variables. rec06 audio-video (.flv) | [T] Chpts. 10-11 script8.R |
|11| 31.03 16:00-18:00 | Distance-learning | Law of large numbers. The central limit theorem. rec07 audio-video (.flv) | [T] Chpts. 13-14 script9.R |
|12| 1.04 9:00-11:00 | Distance-learning | Graphical summaries. rec08 audio-video (.flv) | [T] Chpt. 15 script10.R |
|13| 7.04 16:00-18:00 | Distance-learning | Numerical summaries. Data preprocessing in R. Q&A on the project. rec09 audio-video (.flv), project data audio-video (.flv) | [T] Chpt. 16, [R] Chpts. 4,10 script11.R, dataprep.R |
|14| 8.04 9:00-11:00 | Distance-learning | Unbiased estimators. Efficiency and MSE. rec10 audio-video (.flv) | [T] Chpts. 17.1-17.3, 19, 20 script12.R |
|XX| 15.04 9:00-11:00 | | No lesson on this date. Students work on the project on their own. | |
|15| 21.04 16:00-18:00 | Distance-learning | Maximum likelihood. Fisher information.rec11 audio-video (.flv) | [T] Chpt. 21 notes1.pdf script13.R |
|16| 22.04 9:00-11:00 | Distance-learning | Simple linear and polynomial regression. Least squares. rec12 audio-video (.flv) | [T] Chpts. 17.4,22 [R] Chpts. 6,12.1 script14.R |
|17| 28.04 16:00-18:00 | Distance-learning | Multiple, non-linear, and logistic regression. rec13 audio-video (.flv) | [R] Chpt. 13,16.1-16.2 notes2.pdf script15.R |
|18| 29.04 9:00-11:00 | Distance-learning | Confidence intervals: Gaussian, T-student, large sample method. rec14 audio-video (.flv) | [T] Chpts. 23.1,23.2,23.4, 24.3,24.4 script16.R |
|19| 05.05 16:00-18:00 | Distance-learning | Confidence intervals in linear regression. Empirical bootstrap. Application to confidence intervals. rec15 audio-video (.flv) | [T] Chpts. 18.1,18.2,23.3 notes2.pdf script17.R |
|20| 06.05 9:00-11:00 | Distance-learning | Parametric bootstrap. Hypotheses testing. rec16 audio-video (.flv) | [T] Chpts. 18.3,25 script18.R |
|21| 12.05 16:00-18:00 | Distance-learning | One-sample t-test and application to linear regression. rec17 audio-video (.flv) | [T] Chpts. 26-27, [R] Chpts. 5.1,5.2 notes2.pdf script19.R |
|22| 13.05 9:00-11:00 | Distance-learning | Goodness of fit: chi-square, K-S. Fitting power laws. rec18 audio-video (.flv) | K-S script20.R |
|XX| 19.05 16:00-18:00 | | No lesson on this date. Students work on the project on their own. | |
|23| 20.05 9:00-11:00 | Distance-learning| Hypotheses testing: F-test, comparing two samples. rec19 audio-video (.flv) | [T] Chpts. 28, [R] Chpts. 5.3-5.7 script21.R |
|XX| 26.05 16:00-18:00 | | No lesson on this date. Students work on the project on their own. | |
|24| 27.05 9:00-11:00 | Distance-learning | Project tutoring. rec20 audio-video (.flv) | |
=====Previous years=====
* Statistical Methods for Data Science A.Y. 2018/19
* Statistical Methods for Data Science A.Y. 2017/18
* Statistical Methods for Data Science A.Y. 2016/17