Strumenti Utente

Strumenti Sito


bigdataanalytics:bda:bda2017

Big Data Analytics A.A. 2017/18

Instructors - Docenti:

Learning goals -- Obiettivi del corso

Objective In our digital society, every human activity is mediated by information technologies. Therefore, every activity leaves digital traces behind, that can be stored in some repository. Phone call records, transaction records, web search logs, movement trajectories, social media texts and tweets, Every minute, an avalanche of “big data” is produced by humans, consciously or not, that represents a novel, accurate digital proxy of social activities at global scale. Big data provide an unprecedented “social microscope”, a novel opportunity to understand the complexity of our societies, and a paradigm shift for the social sciences. Objective of the course is twofold: an introduction to the emergent field of big data analytics and social mining, aimed at acquiring and analyzing big data from multiple sources to the purpose of discovering the patterns and models of human behavior that explain social phenomena and an introduction to the technological scenario of scalable analytics.

Intro lectures

Lecture 1: Course Presentation, Course organization, Big Data Landscape: Opportunities, risks, big data sources, challenges.

Slides:https://goo.gl/WztPDg

Technologies lectures:

Lecture 1: Overview/Recall parallel computing. Slides: https://goo.gl/eCwz7G

Lecture 2: Introduction to Hadoop and Map-Reduce Patterns. Slides: https://goo.gl/kukSQx https://goo.gl/efVLKD

Lecture 3: HDFS and Spark (LAB). Slides https://goo.gl/eD5p6c

Lecture 4-5-6: Data Analytics with Spark (LAB) (Last slides of Lecture 3 with exercises) https://goo.gl/AQJXhD

Lecture 7-8-9: Data Mining with Spark and Mllib (LAB) Slides: https://goo.gl/HJEQwT, Materials: https://goo.gl/VxAEhi

Methodological scenarios lectures:

Lecture 1-2: What is possible to observe with Mobile Phone Data? Formulation of novel questions to be answered: estimating population, understanding city dynamics, estimating unemployment or gender Distribution, Wellbeing; The complexity of feature construction; Model Construction; new mining algorithms; validation strategies.

Slides: https://goo.gl/fULiAu, https://goo.gl/UZEPdu

Lecture 3-4: What is possible to observe with GPS data? Formulation of novel questions to be answered: Understanding Human Mobility; the complexity of feature construction, new Model Construction, ew mining algorithms; validation strategies.

Slides: https://goo.gl/ztUvLd

Lecture 5-6: What is possible to observe with Social Media Data? Formulation of novel questions to be answered: Understanding Sentiment, Wellbeing, Happyness; the complexity of feature construction, new Model Construction, ew mining algorithms; validation strategies.

Lecture 7: What is possible to observe with IoT Data? Formulation of novel questions to be answered: Understanding performance in Sport; the complexity of feature construction, new Model Construction, ew mining algorithms; validation strategies.

Datasets

The datasets overview: https://goo.gl/fyAjth The datasets folder: https://goo.gl/nPd6HT

Solutions for the tech midterms are in the exercises folder of the datasets.

Calendar

18/09 - (Intro) Course Presentation, Big Data Landscape

22/09 - (Tech) Overview/Recall parallel computing

25/09 - (Method) What is possible to do observe with Mobile Phone Data? (i)

29/09 - (Method) What is possible to do observe with Mobile Phone Data? (ii)

02/10 - (Tech) Introduction to Hadoop e Design Pattern (Lab)

06/10 - Cancelled!

09/10 - (Tech) Managing HDFS and Introduction to Spark (Lab) and Datasets Presentation

13/10 - (Tech) Data Analytic with Spark (Lab)

16/10 - (Tech) Data Analytic with Spark (Lab)

20-23/10 - No Class (Time to practice!)

27/10 - (Tech) Data Analytic with Spark (Lab)

30/10 Mid-term Tech I - 16,30 starts, you will have 1 hour and 30 minutes

6/11 - (Tech) Data Mining with Spark and Mllib (Lab) (i)

10/11 - (Method) What is possible to do observe with GPS data? (i)

13/11 - (Tech) Data Mining with Spark and Mllib (Lab) (ii)

17/11 - (Method) What is possible to do observe with GPS data? (ii)

20/11 - Discussing the final project proposal - Collective discussion (not evaluated)

24/11 - (Tech) Data Mining with Spark and Mllib (Lab) (iii)

27/11 - (Method) What is possible to do observe with Social Media Data? (i)

01/12 - (Method) What is possible to do observe with Social Media Data? (ii)

4/12 - (Method) What is possible to do observe with GPS data? (iii)

11/12 - Cancelled due weather

15/12 - Discussing the final project proposal - Collective discussion (not evaluated) and (Method) What is possible to do observe with IoT data: examples from sport ?

18/12 Mid-term Tech II

12/01 - 14,00 @ CNR (Entrance 20 - Room C36b) - Mid-term Tech part I and/or II (2° chance, send an e-mail before 07/01 if you want do it)

22/01 - 16/02 Final Project and Discussion: 14,00 @ CNR (Entrance 20 - Room C40)

Exam

The two mid-terms will be 40% of the final grade, the remaining 60% is the evaluation of the Project and the Discussion (prepare some Slides to present your project). There is the possibility to do the a final test about technologies if the Mid-Terms are not sufficient.

The following table describe the expected content of a project:

Laboratories

Student should bring their own laptop (especially for technology lectures).

Software & Links

Virtual Machines:

bigdataanalytics/bda/bda2017.txt · Ultima modifica: 04/11/2022 alle 12:22 (18 mesi fa) da Salvatore Ruggieri