====== LABORATORY OF DATA SCIENCE (2018/2019) ====== Teacher: * **Anna Monreale** * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa * [[http://kdd.isti.cnr.it/homes/monreale/]] * [[anna.monreale@unipi.it]] * Office hours by appointment, Room 374/DO, Dept. of Computer Science. * Telephone +39-050-2213119 Teaching assistant: * **Roberto Pellungrini** * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa * [[roberto.pellungrini@di.unipi.it]] * Office hours: Thursday 14:30-16:30, Room 384/DO, Dept. of Computer Science. * Telephone +39-050-2212728 ====== News ===== * **[11-02-2019]:Results of the written exam of 06/02/2019 {{ :mds:lbi:2018-feb-results.pdf | Results and Oral dates}}** * [21-01-2019]:Results of the written exam of 16/01/2019 {{ :mds:lbi:results-jan19.pdf | Results and Oral dates}} * [27-12-2018]:Results of the second midterm {{ :mds:lbi:secondmidterm.pdf | Results and Oral dates}} * [01/12/2018]: Text of Exercises in MDX and Analytical SQL: {{ :mds:lbi:exercises_mdx.pdf |}} * [13-11-2018]:Results of the first midterm {{:mds:lbi:first_midterm_marks.pdf |Results}} * [07-11-2018]:Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as _foodmart; (2) from project properties select 'Deployment', then rename the database as _foodmart; (3) click on the button "show all files" just above "Solution explorer" right click on "view code" on the .database file that is visualized, and then change the ID from ruggieri_foodmart into _foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you may[[ http://technet.microsoft.com/en-us/library/ms175630.aspx#bkmk_newusingwizard|import the project]] from the SSAS server and rename it as _foodmart (step 4 is still necessary). * [20-10-2018]:** {{ :mds:lbi:ex-midterm.pdf |Here}} you can find exercises simular to those you can find in the first mid-term. Please try to address them and on October 25, 2018 during the lesson we will discuss the solutions.** * [09-10-2018]: The lesson of Sept, 17 will be recovered on October 25, 2018 Room M * [09-09-2018]: Lessons will start on Monday, 24th. Please, see details below. ====== Hours and Rooms ====== **Classes ** Lessons will be held at: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa. ^ Day of Week ^ Hour ^ Room ^ | Monday | 09:00 - 11:00 | LAB M | | Tuesday| 11:00 - 13:00 | LAB M | **Office hours by appointment, Room 374/DO, Dept. of Computer Science. ** ====== Learning Material ====== ===== Slides & Registration of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Registration of each lecture will be published in the calendar after each class ===== Past Exams ===== * {{ :mds:lbi:2016midterm1text.pdf |2016/17 text}}, {{ :mds:lbi:2015fallmidterm1text.pdf | 2015/16 text}} and {{ :mds:lbi:2015wintermidterm1.zip | 2015/16 solution}}, {{:mds:lbi:2015midterm1text.pdf | 2014/15 text}} and {{ :mds:lbi:2015midterm1.zip |2014/2015 solution}}, {{ :mds:lbi:2014midterm1text.pdf | 2013/14 text}},{{ :mds:lbi:2013midterm1.pdf | 2012/13 text }} and {{ :mds:lbi:2013midterm1.zip |2012/13 solution}}. ===== Software===== * Anaconda with Python 3.5 * SQL Server 2016 Developer Edition: - Mandatory: [[https://docs.microsoft.com/en-us/sql/ssms/download-sql-server-management-studio-ssms | SQL Server 2016 Management Studio]] and [[https://docs.microsoft.com/en-us/sql/ssdt/download-sql-server-data-tools-ssdt | SQL Server 2016 Data Tools]] (no need to install Microsoft Visual Studio 2015! choose option 4 "Download SSDT as an ISO image") - Optional (not recommended on laptops): SQL Server 2016 Developer Edition can be [[https://www.microsoft.com/en-us/sql-server/application-development |downloaded from Microsoft]] or can be [[http://msdnaa.di.unipi.it/ | downloaded from MSDN-AA]]. During installation, set the following [[http://pages.di.unipi.it/ruggieri/teaching/_lbi/sql2016server.png| options]] as a minimum. * Microsoft Excel * [[https://powerbi.microsoft.com/it-it/desktop/| Power BI Desktop]] * WEKA: https://www.cs.waikato.ac.nz/ml/weka/ * WEKA API: Wrapper in Python - https://pypi.org/project/python-weka-wrapper/ ===== F.A.Q. ===== * [[http://www.sid.unipi.it/polo2/2015/03/26/connessione-alle-reti-wifi/ | Connection to wi-fi]] * [[http://www.sid.unipi.it/polo2/studenti/ | F.A.Q.s about the labs]] ====== Class calendar - (2018-2019) ====== ^ ^ Day ^ Topic ^ Slides ^ Registration ^ Data/Software ^ References ^ | | 17.09 09:00-11:00 | Canceled - The lesson will be recovered on **October 19, 2018 Room I h:11-13** | | | | | | 18.09 11:00-13:00 | Canceled - The lesson will be recovered on **October 25, 2018 Room M h:9-11 ** | | | | |1. | 24.09 09:00-11:00 | Introduction. File data access. Representation formats: CSV, FLV, ARFF, XML| {{ :mds:lbi:lds.01.introduction.pdf |}} {{ :mds:lbi:lds.02.bi_architectures.pdf|}} {{ :mds:lbi:lds.03.file_data_access.pdf |}}| [[http://lds.di.unipi.it/apa/video/2018-Video/2018-09-24.flv|Video 24/09/2018]] | | -** BI technology:** [[https://cacm.acm.org/magazines/2011/8/114953-an-overview-of-business-intelligence-technology/fulltext | An Overview of Business Intelligence Technology]] - **File access:** {{ :mds:lbi:filesystem.pdf | File System Interface}} - **File Formats:** [[http://www.stat.auckland.ac.nz/~paul/ItDT | Introduction to data technologies(Chps. 5, 6)]], [[http://weka.wikispaces.com/ARFF+(stable+version)|Weka ARFF Format]], [[http://weka.wikispaces.com/XRFF|XRFF Format]] | |2. | 25.09 11:00-13:00 | Python Recap | {{ :mds:lbi:lds.04.python.pdf | Python Recap}} | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-09-25.flv|Video 25/09/2018]] | | | |3. | 01.10 09:00-11:00 | File data access in Python. Lab practice on file access. | {{ :mds:lbi:lds.05.fileaccess-python.pdf |}} |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-01.flv|Video 01/10/2018]] | {{ :mds:lbi:data1.zip | Sample data}} {{ :mds:lbi:code-2018-09-25.zip |}}| | |4. | 02.10 11:00-13:00 | Lab practice on file access and transformation from CSV2ARFF file format. | |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-02.flv|Video 02/10/2018]] | {{ :mds:lbi:xmlelements2csv.zip |}} {{ :mds:lbi:csv2arff.zip |}} {{ :mds:lbi:code-2018-10-01.zip |}}| | |5. | 08.10 09:00-11:00 | Lab practice on file access. | |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-08.flv|Video 08/10/2018]] | {{ :mds:lbi:ex-customers.pdf |}} {{ :mds:lbi:data-customers.zip |}}| | |6. | 09.10 11:00-13:00 | RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | {{ :mds:lbi:lbi.06.relationaldataaccess-1.pdf |}} |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-09.flv|Video 09/10/2018]] | | | |7. | 15.10 09:00-11:00 | Lab practice: stratified sampling in ODBC. | {{ :mds:lbi:lbi.06.relational_data_access-complete.pdf |}}|[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-15.flv|Video 15/10/2018]] | {{ :mds:lbi:code-2018-10-15.zip |}}| | |8. | 16.10 11:00-13:00 | Introduction to SQL Server. ETL tools: SQL Server Integration Services (SSIS). | {{ :mds:lbi:lds.07.sqlserver.pdf |}} {{ :mds:lbi:lds.08.etlandssis.pdf |}}|[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-16.flv|Video 16/10/2018]] | {{ :mds:lbi:stratifiedsampling.zip |}}| | |9. | 19.10 11:00-13:00 | SSIS samples and lab practice: update and pipeline. | | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-19.flv|Video 19/10/2018]]| {{ :mds:lbi:lds-ssis-samples.zip |}} {{ :mds:lbi:ex-midterm.pdf |}}| |10. | 22.10 09:00-11:00 | SSIS samples and lab practice: sampling, update, surrogate keys. | |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-22.flv|Video 22/10/2018]] | | | |11. | 23.10 11:00-13:00 | SSIS samples and lab practice: surrogate keys, slowly changing dimensions, Mid-term practice| | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-23.flv|Video 23/10/2018]]| {{ :mds:lbi:2016ssis.zip |}} | |12. | 25.10 09:00-11:00 | SSIS samples and lab practice: surrogate keys, slowly changing dimensions, Mid-term practice| | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-25.flv|Video 25/10/2018]]| {{ :mds:lbi:dissimilarity.zip |Dissimilarity.py}} {{ :mds:lbi:mdp.zip | MDP.py exam 14/4/2015 }} {{ :mds:lbi:siss-mdp.zip |}} {{ :mds:lbi:ssis-dissimilarityindex.zip |}} | |13. | 05.11 09:00-11:00 | Datawarehousing and OLAP recap. Data cubes, analytic SQL, and materialized views in SQL Server. | {{ :mds:lbi:lds.09.dwandolap.pdf |}} | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-05.flv|Video 05/11/2018 First Part]] [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-05-2.flv|Video 05/11/2018 Second Part]]|{{ :mds:lbi:lbi.08.afdemo.sql.zip |}} |For DW and OLAP: [[http://pages.di.unipi.it/ruggieri/teaching/dsd/|Decision support databases]] course lecture notes. | |14. | 07.11 11:00-13:00 |OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, hierarchies. Data cubes.| {{ :mds:lbi:lds.10.ssas.pdf |}} | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-06.flv|Video 06/11/2018]]| {{ :mds:lbi:monreale_foodmart.zip |}} **Notice:** Please read the instructions in the Section NEWS! | **1) SSAS (olap):** [[http://msdn.microsoft.com/en-us/library/bb522607.aspx|documentation]]; 2) S. Harinath et al. {{ :mds:lbi:ssas2012ch456.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6}}. | |15.|12.11 09:00-11:00 |Parent-child hierarchies. OLAP explorative data analysis with Pivot Tables in Excel. | | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-12-1Part.flv|Video 12/11/2018 First Part]] [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-12-13-2Part.flv|Video 12-13/11/2018]] | | **Pivot Tables in Excel:** G. Harvey. {{ :mds:lbi:pivottable2013bookviichpt2.pdf |Excel 2013 All-in-One For Dummies, 2013. Chp. VII-2}}. | |16.|13.11 11:00-13:00 |Calculated metrics. ROLAP and MOLAP in SSAS. | |The Video of the previous lecture includes also the topic of this lecture. | {{ :mds:lbi:foodmartexplorative.xlsx |}} | **MDX:** 1) [[http://msdn.microsoft.com/en-us/library/bb500184.aspx|documentation]] and a [[https://www.mssqltips.com/sqlservertip/3129/order-and-sort-with-mdx-in-sql-server-analysis-services/|useful guide on ordering]]; 2) S. Harinath ed al. {{ :mds:lbi:ssas2012ch3.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chp. 3.}} | |17.|19.11 09:00-11:00 |Practice with MDX. | | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-19.flv|Video 19/11/2018]] | {{ :mds:lbi:lbi.09.mdxsample.mdx.zip |}} | | |18.|20.11 11:00-13:00 |Practice with MDX. | | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-20.flv|Video 20/11/2018]] |{{ :mds:lbi:lbi.09.mdxpractice.mdx.zip |}} | | |19.|26.11 09:00-11:00 |Practice with MDX. | | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-26.flv|Video 26/11/2018]]| {{ :mds:lbi:20170208.pdf |}} | | |20.|27.11 11:00-13:00 | Reporting with Power BI Desktop. Data Mining pre-processing in WEKA. | {{ :mds:lbi:lds.12.powerbi.pdf |}} {{ :mds:lbi:lds.13.weka.pdf |}}| [[http://lds.di.unipi.it/apa/video/2018-Video/2018-11-27.flv|Video 27/11/2018]] | {{ :mds:lbi:weka.3.7.9.light.zip |}}{{ :mds:lbi:wekapatch.zip |}} | | |21.|03.12 09:00-11:00 |WEKA Classification. | {{ :mds:lbi:meta-cost-classification.pdf |}} | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-12-03.flv|Video 03/12/2018]]| {{ :mds:lbi:lsd.practice.ee.pdf |}}{{ :mds:lbi:ee.zip | Data-ee}} | | |22.|04.12 11:00-13:00 | WEKA Classification (practice) & AR | {{ :mds:lbi:lds.14.associationrules.pdf |}} | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-12-04-1.flv|Video 1 04/12/2018]] [[http://lds.di.unipi.it/apa/video/2018-Video/2018-12-04-2.flv|Video 2 04/12/2018]] [[http://lds.di.unipi.it/apa/video/2018-Video/2018-12-04-3.flv|Video 3 04/12/2018]]| {{ :mds:lbi:lds.practicesolution.ee.pdf |}} | | |23.|10.12 09:00-11:00 |WEKA AR & Practice. Weka API. | {{ :mds:lbi:lds.15.wekaapi.pdf |}} | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-12-10-1.flv|Video 1 10/12/2018]] [[http://lds.di.unipi.it/apa/video/2018-Video/2018-12-10-2.flv|Video 2 10/12/2018]]| {{ :mds:lbi:wekaapi-example.zip | Python example for WEKA API}} | | |22.|11.12 11:00-13:00 | Practice for the second midterm| | | {{ :mds:lbi:exercises-2midterm.zip | Queries sec. Midterm}} {{ :mds:lbi:weka-ex.zip | Weka practice}} {{ :mds:lbi:20140205.pdf | Exercise on MDX}}| | ====== Exams ====== ===== Mid-term exams ===== **Rule: ** Students may do the second mid-term even if they did have the first mid-term. ^ ^ Date ^ Hour ^ Room^ Notes ^ Marks ^ |29.10.2018| 09:00 - 12:00| Room M | | | |17.12.2018| 09:00 - 12:00| Room M | | | ===== Exam sessions ===== **Rule:** Students having at least one mid-term exam may do only one part of the written exam in the exam sessions. ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ |1.|16.01.2019| 09:00 - 12:00| Room M | | | |2.|06.02.2019| 09:00 - 12:00| Room M | | | |3.|18.06.2019| 09:00 - 13:00| Room H | Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September.| | |4.|09.07.2019| 09:00 - 13:00| Room H |Oral Exam on DM1 within 15 July. If you cannot do within that date you can do the oral exam on September. | | =====Extra sessions A.A. 2017/18===== ^ Date ^ Time ^ Room ^ Notes ^ Results ^ |29.10.2018| 09:00 - 12:00| Room M | | | =====Past Editions ===== * [[LDS 2018-2019]] * [[LBI 2017-2018]]