====== Big Data Analytics A.A. 2019/20 ====== Instructors - Docenti: * Fosca Giannotti, Luca Pappalardo * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa * * fosca [dot] giannotti [at] isti [dot] cnr [dot] it * luca [dot] pappalardo [at] isti [dot] cnr [dot] it ====== Learning goals ====== In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools. This course has three objectives: * introducing to the emergent field of big data analytics and social mining; * introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling; * guide students to the development of a open-source and reproducible big data analytics project, based on the analyis of real-world datasets. ====== Module 1: Big Data Analytics and Social Mining ====== In this module, analytical methods and processes are presented thought exemplary cases studies in challenging domains, organized according to the following topics: * The Big Data Scenario and the new questions to be answered * Sport Analytics: - Soccer data landscape and injury prediction - Analysis and evolution of sports performance * Mobility Analytics - Mobility data landscape and mobility data mining methods - Understanding Human Mobility with vehicular sensors (GPS) - Mobility Analytics: Novel Demography with mobile-phone data * Social Media Mining - The social media data landscape: Facebook, Linked-in, Twitter, Last_FM - Sentiment analysis. example from human migration studies - Discussion on ethical issues of Big Data Analytics * Well-being&Now-casting - Nowcasting influenza with retail market data - Predicting well-being from human mobility patterns * Paper presentations by students ====== Module 2: Big Data Analytics Technologies ====== This module will provide to the students the technologies to collect, manipulate and process big data. In particular the following tools will be presented: * Python for Data Science * The Jupyter Notebook: developing open-source and reproducible data science * MongoDB: fast querying and aggregation in NoSQL databases * GeoPandas: analyze geo-spatial data with Python * Scikit-learn: programming tools for data mining and analysis * M-Atlas: a toolkit for mobility data mining ====== Module 3: Laboratory for Interactive Project Development ====== During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed. * Data Understanding and Project Formulation * Mid Term Project Results * Final Project results ====== Calendar ====== 16/09 (Mod. 1) Introduction to the course, The Big Data scenario mod1.introduction_bigdatalandscape_newquestions_.pdf ===== Exam ===== The two mid-terms will be 40% of the final grade, the remaining 60% is the evaluation of the Project and the Discussion (prepare some Slides to present your project). There is the possibility to do the a final test about technologies if the Mid-Terms are not sufficient. The following table describe the expected content of a project: ====== Previous Big Data Analytics websites ====== Big Data Analytics A.A. 2018/19 Big Data Analytics A.A. 2017/18 Big Data Analytics A.A. 2016/17 Big Data Analytics A.A. 2015/16

