Strumenti Utente

Strumenti Sito


bigdataanalytics:bda:start

Big Data Analytics A.A. 2020/21

NOTICE: ON MONDAY, SEPTEMBER 21st, ALL LESSONS ARE SUSPENDED BECAUSE OF THE ELECTION DAY IN ITALY.

WARNING: All lectures of the First Semester of the academic year 2020/21, until 31/12/2020, will be provided exclusively remotely, through the Teams team named “599AA 20/21 - BIG DATA ANALYTICS [WDS-LM]” (https://bit.ly/35yJ65c).

ATTENZIONE: Tutte le lezioni frontali del Primo Semestre dell’a.a. 2020/21, fino al 31/12/2020, verranno erogate esclusivamente in modalità a distanza, attraverso il canale Teams “599AA 20/21 - BIG DATA ANALYTICS [WDS-LM]” (https://bit.ly/35yJ65c).

Instructors - Docenti:

Timetable (http://bit.ly/unipi_timetable_2020)

  • Monday 16:15 - 18:00 Aula WDS/1
  • Tuesday 16:15 - 18:00 Aula WDS/1

Fill the doodle with your preference for time/day during the week (forgot about dates, just care about the day of the week and the time of the day): https://doodle.com/poll/bwt8aa5zyczn8p6d

Pre-registration to the course: fill the form with your name and surname, email, skills and languages (the results of the form will help building up teams), by Wed, September 16th: https://forms.gle/tzxKRP4aidKBpk8E9

Team Registration: build up teams of 3 or 4 students and register your team here, by September 23th: https://forms.gle/rbsV4dF6RuAnCBWz9

Learning goals

In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools. This course has three objectives:

  • introducing to the emergent field of big data analytics and social mining;
  • introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling;
  • guide students to the development of a open-source and reproducible big data analytics project, based on the analyis of real-world datasets.

Module 1: Big Data Analytics and Social Mining

In this module, analytical methods and processes are presented thought exemplary cases studies in challenging domains, organized according to the following topics:

  • The Big Data Scenario and the new questions to be answered
  • Sport Analytics:
    1. Soccer data landscape and injury prediction
    2. Analysis and evolution of sports performance
  • Mobility Analytics
    1. Mobility data landscape and mobility data mining methods
    2. Understanding Human Mobility with vehicular sensors (GPS)
    3. Mobility Analytics: Novel Demography with mobile-phone data
  • Social Media Mining
    1. The social media data landscape: Facebook, Linked-in, Twitter, Last_FM
    2. Sentiment analysis. example from human migration studies
    3. Discussion on ethical issues of Big Data Analytics
  • Well-being&Now-casting
    1. Nowcasting influenza with retail market data
    2. Predicting well-being from human mobility patterns
  • Paper presentations by students

Module 2: Big Data Analytics Technologies

This module will provide to the students the technologies to collect, manipulate and process big data. In particular the following tools will be presented:

  • Python for Data Science
  • The Jupyter Notebook: developing open-source and reproducible data science
  • MongoDB: fast querying and aggregation in NoSQL databases
  • GeoPandas: analyze geo-spatial data with Python
  • Scikit-learn: machine learning in Python
  • Keras: deep learning in Python

Module 3: Laboratory for Interactive Project Development

During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed.

  • 1st Mid Term: Data Understanding and Project Formulation
  • 2nd Mid Term: Model(s) construction and evaluation
  • 3rd Mid Term: Model interpretation/explanation
  • Exam: Final Project results

Calendar

14/09 (Mod. 1) Introduction to the course, The Big Data scenario lesson1_introduction_to_the_course_bda2021.pdf

15/09 (Mod. 2) Python for Data Science and the Jupyter Notebook: developing open-source and reproducible data science

21/09 No Lesson (Election Day in Italy)

22/09

28/09 (Mod. 2) Geopandas and scikit-mobility: analyze trajectory data in Python: geopandas.zip

29/09 (Mod. 2) PyMongo and MongoDB: fast querying and aggregation in NoSQL databases: mongodb.zip

05/10 (Mod. 1) Soccer data landscape and injury prediction

06/10 No Lesson (SocInfo2020 conference)

12/10 (Mod. 1) Performance evaluation: from human evaluations to data-driven algorithms

13/10 (Mod. 1) Nowcasting well-being with Big Data

19/10 (Mod. 3) 1st Mid Term - first group of teams

20/10 (Mod. 3) 1st Mid Term - second group of teams

26/10 (Mod. 3) Discussion and group working on projects

27/10 (Mod. 3) Discussion and group working on projects

02/11 (Mod. 1) Forecasting influenza with retail market data

03/11 (Mod. 1) Trustworthy data mining

16/11 (Mod. 3) 2nd Mid Term - first group of teams

17/11 (Mod. 3) 2nd Mid Term - second group of teams

23/11 (Mod. 3) Discussion and group working on projects

24/11 (Mod. 3) Discussion and group working on projects

30/11 (Mod. 3) Paper presentations

01/12 (Mod. 3) Paper presentations

07/12 (Mod. 3) 3rd Mid Term - first and second group of teams

Exam

TBC

Previous Big Data Analytics websites

bigdataanalytics/bda/start.txt · Ultima modifica: 16/09/2020 alle 14:54 (4 giorni fa) da Luca Pappalardo