Questa è una vecchia versione del documento!
<html>
<!– Google Analytics –>
<script type=“text/javascript” charset=“utf-8”>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-34685760-1', 'auto', 'personalTracker', {'allowLinker': true});
ga('personalTracker.require', 'linker');
ga('personalTracker.linker:autoLink', ['pages.di.unipi.it', 'enforce.di.unipi.it', 'didawiki.di.unipi.it'] );
ga('personalTracker.require', 'displayfeatures');
ga('personalTracker.send', 'pageview', 'ruggieri/teaching/bda/');
setTimeout(“ga('send','event','adjusted bounce rate','30 seconds')”,30000);
</script>
<!– End Google Analytics –>
<!– Capture clicks –>
<script>
jQuery(document).ready(function(){
jQuery('a[href$=“.pdf”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'BDA', 'PDFs', fname);
});
jQuery('a[href$=“.r”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'BDA', 'Rs', fname);
});
jQuery('a[href$=“.zip”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'BDA', 'ZIPs', fname);
});
jQuery('a[href$=“.mp4”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'BDA', 'Videos', fname);
});
jQuery('a[href$=“.flv”]').click(function() {
var fname = this.href.split('/').pop();
ga('personalTracker.send', 'event', 'BDA', 'Videos', fname);
});
});
</script>
</html>
====== Big Data Analytics A.A. 2019/20 ======
Instructors - Docenti:
* Fosca Giannotti, Luca Pappalardo
* KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa
* http://www-kdd.isti.cnr.it
* fosca [dot] giannotti [at] isti [dot] cnr [dot] it
* luca [dot] pappalardo [at] isti [dot] cnr [dot] it
====== Learning goals ======
In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools.
This course has three objectives:
* introducing to the emergent field of big data analytics and social mining;
* introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling;
* guide students to the development of a open-source and reproducible big data analytics project, based on the analyis of real-world datasets.
====== Module 1: Big Data Analytics and Social Mining ======
In this module, analytical methods and processes are presented thought exemplary cases studies in challenging domains, organized according to the following topics:
* The Big Data Scenario and the new questions to be answered
* Sport Analytics:
- Soccer data landscape and injury prediction
- Analysis and evolution of sports performance
* Mobility Analytics
- Mobility data landscape and mobility data mining methods
- Understanding Human Mobility with vehicular sensors (GPS)
- Mobility Analytics: Novel Demography with mobile-phone data
* Social Media Mining
- The social media data landscape: Facebook, Linked-in, Twitter, Last_FM
- Sentiment analysis. example from human migration studies
- Discussion on ethical issues of Big Data Analytics
* Well-being&Now-casting
- Nowcasting influenza with retail market data
- Predicting well-being from human mobility patterns
* Paper presentations by students
====== Module 2: Big Data Analytics Technologies ======
This module will provide to the students the technologies to collect, manipulate and process big data. In particular the following tools will be presented:
* Python for Data Science
* The Jupyter Notebook: developing open-source and reproducible data science
* MongoDB: fast querying and aggregation in NoSQL databases
* GeoPandas: analyze geo-spatial data with Python
* Scikit-learn: programming tools for data mining and analysis
* M-Atlas: a toolkit for mobility data mining
====== Module 3: Laboratory for Interactive Project Development ======
During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed.
* Data Understanding and Project Formulation
* Mid Term Project Results
* Final Project results
====== Calendar ======
16/09 (Mod. 1) Introduction to the course, The Big Data scenario mod1.introduction_bigdatalandscape_newquestions_.pdf
===== Exam =====
The two mid-terms will be 40% of the final grade, the remaining 60% is the evaluation of the Project and the Discussion (prepare some Slides to present your project).
There is the possibility to do the a final test about technologies if the Mid-Terms are not sufficient.
The following table describe the expected content of a project:
====== Previous Big Data Analytics websites ======
Big Data Analytics A.A. 2018/19
Big Data Analytics A.A. 2017/18
Big Data Analytics A.A. 2016/17
Big Data Analytics A.A. 2015/16