Strumenti Utente

Strumenti Sito


bigdataanalytics:bda:start

Big Data Analytics A.A. 2018/19

Instructors - Docenti:

Learning goals

In our digital society, every human activity is mediated by information technologies, hence leaving digital traces behind. These massive traces are stored in some, public or private, repository: phone call records, movement trajectories, soccer-logs and social media records are all examples of “Big Data”, a novel and powerful “social microscope” to understand the complexity of our societies. The analysis of big data sources is a complex task, involving the knowledge of several technological and methodological tools. This course has three objectives:

  • introducing to the emergent field of big data analytics and social mining;
  • introducing to the technological scenario of big data, like programming tools to analyze big data, query NoSQL databases, and perform predictive modeling;
  • guide students to the development of a open-source and reproducible big data analytics project, based on the analyis of real-world datasets.

Module 1: Big Data Analytics and Social Mining

In this module, analytical methods and processes are presented thought exemplary cases studies in challenging domains, organized according to the following topics:

  • The Big Data Scenario and the new questions to be answered
  • Sport Analytics:
    1. Soccer data landscape and injury prediction
    2. Analysis and evolution of sports performance
  • Mobility Analytics
    1. Mobility data landscape and mobility data mining methods
    2. Understanding Human Mobility with vehicular sensors (GPS)
    3. Mobility Analytics: Novel Demography with mobile-phone data
  • Social Media Mining
    1. The social media data landscape: Facebook, Linked-in, Twitter, Last_FM
    2. Sentiment analysis. example from human migration studies
    3. Discussion on ethical issues of Big Data Analytics
  • Well-being&Now-casting
    1. Nowcasting influenza with retail market data
    2. Predicting well-being from human mobility patterns
  • Paper presentations by students

Module 2: Big Data Analytics Technologies

This module will provide to the students the technologies to collect, manipulate and process big data. In particular the following tools will be presented:

  • Python for Data Science
  • The Jupyter Notebook: developing open-source and reproducible data science
  • MongoDB: fast querying and aggregation in NoSQL databases
  • GeoPandas: analyze geo-spatial data with Python
  • Scikit-learn: programming tools for data mining and analysis
  • M-Atlas: a toolkit for mobility data mining

Module 3: Laboratory for Interactive Project Development

During the course, teams of students will be guided in the development of a big data analytics project. The projects will be based on real-world datasets covering several thematic areas. Discussions and presentation in class, at different stages of the project execution, will be performed.

  • Data Understanding and Project Formulation
  • Mid Term Project Results
  • Final Project results

Calendar

17/09 (Mod. 1) Introduction to the course, The Big Data scenario mod1.introduction_bigdatalandscape_newquestions_.pdf

21/09 (Mod. 1) Big Data Analytics: new questions to be solved + Presentation of datasets (list of datasets: http://bit.ly/bda_list_datasets, slides: http://bit.ly/bda18_datasets_slides)

24/09 (Mod. 2) Python for Data Science, The Jupyter Notebook: developing open-source and reproducible data science

28/09 (Mod. 2) Soccer data landscape and players’ injury prediction

01/10 (Mod. 2) Scikit-learn: programming tools for data mining and analysis. Data Sets presentation.

05/10 (Mod. 2) Analysis and evolution of sports performance

08/10 (Mod. 1) The mobility data landscape and mobility data mining methods

12/10 (Mod. 1) Soccer Data Challenge

15/10 (Mod. 1) Understanding Human Mobility with GPS

19/10 (Mod. 3) Data Understanding and Project Formulation

22/10 (Mod. 2) MongoDB: fast querying and aggregation in NoSQL databases

05/11 (Mod. 2) GeoPandas: analyze geo-spatial data with Python

09/11 (Mod. 1) Predicting well-being from human mobility patterns

12/11 (Mod. 1) Nowcasting influenza with retail market data

16/11 (Mod. 1) papers presentation

19/11 (Mod. 1) papers presentation

23/11 (Mod. 3) Mid Term Project Results

26/11 (Mod. 1) The social media data landscape and social media mining methods

30/11 No lessons

03/12 (Mod. 1) Sentiment analysis: examples from Human Migration studies

07/12 (Mod. 1) Discussion on Ethical issues in Big Data Analytics

10/12 (Mod. 3) Final Project results

14/12 (Mod. 3) Final Project results

12/01 14,00 @ CNR (Entrance 20 - Room C36b) - Exam

Exam

The two mid-terms will be 40% of the final grade, the remaining 60% is the evaluation of the Project and the Discussion (prepare some Slides to present your project). There is the possibility to do the a final test about technologies if the Mid-Terms are not sufficient.

The following table describe the expected content of a project:

Previous Big Data Analytics websites

bigdataanalytics/bda/start.txt · Ultima modifica: 20/09/2018 alle 15:19 (38 ore fa) da Luca Pappalardo