====== Big Data Analytics A.A. 2021/22 ====== Lectures will be also remotely, through the Teams team named "599AA 21/22 - BIG DATA ANALYTICS [WDS-LM]" Instructors - Docenti: * **Luca Pappalardo** * KDD Laboratory, Università di Pisa and ISTI-CNR, Pisa * [[http://www-kdd.isti.cnr.it]] * [[luca.pappalardo@isti.cnr.it]] Tutor: * **Giuliano Cornacchia** * KDD Laboratory, Università di Pisa and ISTI-CNR, Pisa * [[http://www-kdd.isti.cnr.it]] * giuliano.cornacchia@phd.unipi.it Timetable * Wednesday 09:00 - 10:45 Aula Fib M1 * Tuesday 09:00 - 10:45 Aula Fib C1 ====== Learning goals ====== In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools. This course has three objectives: * introducing to the emergent field of big data analytics and social mining; * introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling; * guide students to the development of a open-source and reproducible big data analytics project, based on the analyis of real-world datasets. ====== Module 1: Big Data Analytics and Social Mining ====== In this module, analytical methods and processes are presented thought exemplary cases studies in challenging domains, organized according to the following topics: * The Big Data Scenario and the new questions to be answered * Sport Analytics: - Soccer data landscape and injury prediction - Analysis and evolution of sports performance * Mobility Analytics - Mobility data landscape and mobility data mining methods - Understanding Human Mobility with vehicular sensors (GPS) - Mobility Analytics: Novel Demography with mobile-phone data * Social Media Mining - The social media data landscape: Facebook, Linked-in, Twitter, Last_FM - Sentiment analysis. example from human migration studies - Discussion on ethical issues of Big Data Analytics * Well-being&Now-casting - Nowcasting influenza with retail market data - Predicting well-being from human mobility patterns * Paper presentations by students ====== Module 2: Big Data Analytics Technologies ====== This module will provide to the students the technologies to collect, manipulate and process big data. In particular, the following tools will be presented: * Python for Data Science * The Jupyter Notebook: developing open-source and reproducible data science * MongoDB: fast querying and aggregation in NoSQL databases * GeoPandas: analyze geo-spatial data with Python * Scikit-learn: machine learning in Python * Keras: deep learning in Python ====== Module 3: Laboratory for Interactive Project Development ====== During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed. * 1st Mid Term: Data Understanding and Project Formulation * 2nd Mid Term: Model(s) construction and evaluation * 3rd Mid Term: Model interpretation/explanation * Exam: Final Project results ====== Calendar ====== 15/09 (Mod. 1) Introduction to the course, The Big Data scenario {{ :bigdataanalytics:bda:lesson1_introduction_to_the_course_bda2021.pdf |}} 17/09 (Mod. 2) Python for Data Science and the Jupyter Notebook: developing open-source and reproducible data science * How to install Jupyter notebook: https://jupyter.readthedocs.io/en/latest/install.html * Python notebooks: http://bit.ly/bda2021_notebooks_1 ===== Exam (Appelli) ===== TDA ====== Previous Big Data Analytics websites ====== [[bigdataanalytics:bda:bda2019|]] [[bigdataanalytics:bda:bda2018|]] [[bigdataanalytics:bda:bda2017|]] [[bigdataanalytics:bda:bda2016|]] [[bigdataanalytics:bda:bda2015|]]