====== LABORATORY OF DATA SCIENCE (2019/2020) ====== Teacher: * **Anna Monreale** * KDD Laboratory, Università di Pisa ed ISTI - CNR, Pisa * [[http://kdd.isti.cnr.it/homes/monreale/]] * [[anna.monreale@unipi.it]] * Office hours: Monday 9:00-11:00 or by appointment, Room 374/DO, Dept. of Computer Science. * Telephone +39-050-2213119 Teaching assistant: * **Roberto Pellungrini** * KDD Laboratory, Università di Pisa and ISTI - CNR, Pisa * [[roberto.pellungrini@di.unipi.it]] * Office hours: Wednesday 14:30-16:30, Room 384/DO, Dept. of Computer Science. * Telephone +39-050-2212728 ====== News ===== * **[27-12-2019]: Results second midterm test and proposal for oral exam: {{ :mds:lbi:results-second-midtermtest.pdf |}}. If the proposed date and/or time is not fine for you please write me an email. ** * [26-11-2019]: Additional Lectures: Friday, Nov 29, 14-16 room L1 and Friday, Dec 06, 14-16 room M. * [23-11-2019]: Exam of the first mid-term test: {{:mds:lbi:results-first-mid-test.pdf | Results}} * [21-11-2019]: Instructions for the SSAS project in the Lecture of today: to avoid conflicts in deployment/process follow this steps once the solution is opened: (1) rename the project as _foodmart (2) from project properties select 'Deployment', then rename the database as _foodmart; (3) click on the button "show all files" just above "Solution explorer" right click on "view code" on the .database file that is visualized, and then change the ID from ruggieri_foodmart into _foodmart, and finally save the file; (4) change the credentials of connection to database on SQL Server. As an alternative solution you may[[ http://technet.microsoft.com/en-us/library/ms175630.aspx#bkmk_newusingwizard|import the project]] from the SSAS server and rename it as _foodmart (step 4 is still necessary). * [18-11-2019]: The lesson of Tuesday 19/11/2019 will be canceled. * [02-11-2019]: Since we will do the mid-term on 5 Nov the next week the lesson of Thursday will be canceled * [31-10-2019]: On November 4, 11-13, in Room C I'm organising an additional lesson dedicated to practice for the written exam. * [02-10-2019]: Instructions for Microsoft tools installation are available in the Software section. * [09-09-2019]: Lessons will start on Tuesday, 24th. Please, see details below. ====== Hours and Rooms ====== **Classes ** Lessons will be held at: Polo Didattico "L. Fibonacci", Via F. Buonarroti 4, Pisa. ^ Day of Week ^ Hour ^ Room ^ | Tuesday | 11:00 - 13:00 | LAB M | | Thursday | 11:00 - 13:00 | LAB M | **Office hours by appointment, Room 374/DO, Dept. of Computer Science. ** ====== Learning Material ====== ===== Slides & Registration of the classes ===== * The slides used in the course will be inserted in the calendar after each class. * Registration of each lecture will be published in the calendar after each class ===== Past Exams ===== * {{ :mds:lbi:2016midterm1text.pdf |2016/17 text}}, {{ :mds:lbi:2015fallmidterm1text.pdf | 2015/16 text}} and {{ :mds:lbi:2015wintermidterm1.zip | 2015/16 solution}}, {{:mds:lbi:2015midterm1text.pdf | 2014/15 text}} and {{ :mds:lbi:2015midterm1.zip |2014/2015 solution}}, {{ :mds:lbi:2014midterm1text.pdf | 2013/14 text}},{{ :mds:lbi:2013midterm1.pdf | 2012/13 text }} and {{ :mds:lbi:2013midterm1.zip |2012/13 solution}}. ===== Software===== * Anaconda with Python 3.5 * SQL Server 2016 Developer Edition: [[https://docs.microsoft.com/it-it/sql/ssms/release-notes-ssms?view=sql-server-2017#downloadssdtmediadownloadpng-ssms-1791httpsgomicrosoftcomfwlinklinkid2043154clcid0x409 | SQL Server 2016 Management Studio or SQL Server 2017 Management Studio]] and [[https://docs.microsoft.com/en-us/sql/ssdt/previous-releases-of-sql-server-data-tools-ssdt-and-ssdt-bi?view=sql-server-2017 | SQL Server 2016 Data Tools]]. For Data Tools my suggestion is to install the version SSDT for VS2015 17.4 which is the same version installed in the laboratory computer. Note: It is mandatory to install ** Integration Services ** and ** Analysis Services **. So, during the installation you must select these two elements. * Instruction for SQL Server will be available soon - Optional (not recommended on laptops): SQL Server 2016 Developer Edition can be downloaded from Microsoft or can be [[http://msdnaa.di.unipi.it/ | downloaded from MSDN-AA]]. * Microsoft Excel * [[https://powerbi.microsoft.com/it-it/desktop/| Power BI Desktop]] * WEKA: https://www.cs.waikato.ac.nz/ml/weka/ * WEKA API: Wrapper in Python - https://pypi.org/project/python-weka-wrapper/ ===== F.A.Q. ===== * [[http://www.sid.unipi.it/polo2/2015/03/26/connessione-alle-reti-wifi/ | Connection to wi-fi]] * [[http://www.sid.unipi.it/polo2/studenti/ | F.A.Q.s about the labs]] ====== Class calendar - (2019-2020) ====== ^ ^ Day ^ Topic ^ Slides ^ Registration ^ Data/Software ^ References ^ | | 17.09 11:00-13:00 | Canceled - The lesson will be recovered. | | | | | | 19.09 11:00-13:00 | Canceled - The lesson will be recovered. | | | | |1. | 24.09 11:00-13:00 | Introduction. File data access. Representation formats: CSV, FLV, ARFF, XML|{{ :mds:lbi:2019-lds.01.introduction.pdf |}} {{:mds:lbi:2019-lds.02.bi_architectures.pdf |}} {{:mds:lbi:lds.03.file_data_access.pdf |}}| [[http://lds.di.unipi.it/apa/video/2019-Video/2019-Introduction.flv|Video on Introduction]] [[http://lds.di.unipi.it/apa/video/2019-Video/2019-FileAccess.flv|Video on File Access]] | | -** BI technology:** [[https://cacm.acm.org/magazines/2011/8/114953-an-overview-of-business-intelligence-technology/fulltext | An Overview of Business Intelligence Technology]] - **File access:** {{ :mds:lbi:filesystem.pdf | File System Interface}} - **File Formats:** [[http://www.stat.auckland.ac.nz/~paul/ItDT | Introduction to data technologies(Chps. 5, 6)]], [[http://weka.wikispaces.com/ARFF+(stable+version)|Weka ARFF Format]], [[http://weka.wikispaces.com/XRFF|XRFF Format]] | |2. | 26.09 11:00-13:00 | Python Recap | {{ :mds:lbi:lds.04.python.pdf | Python Recap}} | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-PythonRecap.flv|Video 26/09/2019]] | | | |3. | 01.10 11:00-13:00 | File data access in Python. Lab practice on file access. | {{ :mds:lbi:lds.05.fileaccess-python.pdf |}} | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-01.flv|Video File Access - Python]] | {{ :mds:lbi:data.zip | Sample data}} {{ :mds:lbi:code-2019-09-26.zip |}}| |4. | 03.10 11:00-13:00 |Lab practice on file access and transformation from CSV2ARFF file format. | | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-02.flv|Video CSV2ARFF ]] | {{ :mds:lbi:code-2019-10-1.zip |}}| | |5. | 08.10 11:00-13:00 | Lab practice on file access. | |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-08.flv|Video]] | {{ :mds:lbi:ex-customers.pdf |}} {{ :mds:lbi:data-customers.zip |}} {{ :mds:lbi:lds.file.format.zip |}}| | |6. | 10.10 11:00-13:00 | Practice + RDBMS access protocols: ODBC, OLE DB, JDBC. ODBC Programming. | {{ :mds:lbi:lbi.06.relationaldataaccess-1.pdf |}} |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-09.flv|Video on RDBMS access - Part1]] |{{ :mds:lbi:2018-10-09-ex.zip | SolutionEx: 2018-10-09}} | | |7. | 15.10 11:00-13:00 | Lab practice: stratified sampling in ODBC. | {{ :mds:lbi:lbi.06.relational_data_access-complete.pdf |}} |[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-15.flv|Video on RDBMS access - Part2]] | {{ :mds:lbi:code-db-samples.zip |}}| | |8. | 17.10 11:00-13:00 | Introduction to SQL Server. ETL tools: SQL Server Integration Services (SSIS). | {{ :mds:lbi:lds.07.sqlserver.pdf |}} {{ :mds:lbi:lds.08.etlandssis.pdf |}}|[[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-16.flv|Video on Sol. Stratified Sampling and ETL tools]] | {{ :mds:lbi:stratifiedsampling.zip |}}| | |9. | 22.10 11:00-13:00 | SSIS samples and lab practice pipeline. | | [[http://lds.di.unipi.it/apa/video/2018-Video/2018-10-19.flv|Video on SISS]]| {{ :mds:lbi:lds-ssis-samples.zip |}} {{ :mds:lbi:ex-midterm.pdf |}}| |10. | 24.10 11:00-13:00 | SSIS Dissimilarity - Mid-term practice| | | {{ :mds:lbi:dissimilarity.zip |Dissimilarity.py}} {{ :mds:lbi:mdp.zip | MDP.py exam 14/4/2015 }} {{ :mds:lbi:siss-mdp.zip |}} {{ :mds:lbi:ssis-dissimilarityindex.zip |}} | |11. | 29.10 11:00-13:00 | Stratified Sampling + Update| | | {{ :mds:lbi:stratifiedsampling.zip |}} Exercises: {{ :mds:lbi:20190618.pdf |}} {{ :mds:lbi:20190401.pdf |}} | |12. | 31.10 11:00-13:00 | Practice for Midterm {{ :mds:lbi:20190206.pdf |}}| | | {{ :mds:lbi:progettohealty_food.zip |}} {{ :mds:lbi:lbi06022019.py.zip |}} | |13. | 04.11 11:00-13:00 | Practice for Midterm | | | {{ :mds:lbi:exercises-siss.zip |}} {{ :mds:lbi:esercizio4112019.zip | Ex. Python}} | |14. | 12.11 11:00-13:00 | SSIS: surrogate keys, slowly changing dimensions| | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-12.flv|Video 2019-11-12]] | {{ :mds:lbi:2016ssis.zip |}} | |15. | 14.11 11:00-13:00 | Datawarehousing and OLAP recap. Data cubes, analytic SQL, and materialized views in SQL Server. |{{ :mds:lbi:lds.09.dwandolap.pdf |}} | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-14.flv|Video 2019-11-14]] | {{ :mds:lbi:lbi.08.afdemo.sql.zip |}} | | |10.11 11:00-13:00 | Cancelled | | | | | |16. | 21.11 11:00-13:00 |OLAP with SQL Server Analysis Services (SSAS): data source views, dimensions, hierarchies. Data cubes.| {{ :mds:lbi:lds.10.ssas.pdf |}} | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-21-1.flv|First Video 21/11/2019]] [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-21.flv|Second Video 21/11/2019]] | {{ :mds:lbi:monreale_foodmart.zip |}} **Notice:** Please read the instructions in the Section NEWS! | **1) SSAS (olap):** [[http://msdn.microsoft.com/en-us/library/bb522607.aspx|documentation]]; 2) S. Harinath et al. {{ :mds:lbi:ssas2012ch456.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chps. 4-6}}. | |17.|26.11 11:00-13:00 |Parent-child hierarchies. OLAP explorative data analysis with Pivot Tables in Excel. | | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-26-1.flv|Video 26/11/2019]] | | **Pivot Tables in Excel:** G. Harvey. {{ :mds:lbi:pivottable2013bookviichpt2.pdf |Excel 2013 All-in-One For Dummies, 2013. Chp. VII-2}}. | |18.|28.11 11:00-13:00 |ROLAP and MOLAP in SSAS. MDX. | | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-28.flv|Video 28/11/2019]] | | **MDX:** 1) [[http://msdn.microsoft.com/en-us/library/bb500184.aspx|documentation]] and a [[https://www.mssqltips.com/sqlservertip/3129/order-and-sort-with-mdx-in-sql-server-analysis-services/|useful guide on ordering]]; 2) S. Harinath ed al. {{ :mds:lbi:ssas2012ch3.pdf |Professional Microsoft SQL Server Analysis Services 2012 with MDX and DAX, Wrox publisher, 2012. Chp. 3.}} | |19.|29.11 11:00-13:00 |Calculated metrics. MDX Demo. | | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-29-Excel.flv|Video on ExcelReport]] [[http://lds.di.unipi.it/apa/video/2019-Video/2019-11-29-MDX.flv|Video on MDXQuries]] | {{ :mds:lbi:foodmartexplorative.xlsx |}} {{ :mds:lbi:monreale_foodmart_2.zip |}}| | |20.|03.12 11:00-13:00 |Practice with MDX. | | Thi part is covered by the previous video | {{ :mds:lbi:lbi.09.mdxsample.mdx.zip |}} | | |21.|05.12 11:00-13:00 | Practice with MDX | | [[https://kdd.isti.cnr.it/sites/default/files/video/2019-12-05.flv|Video 5/12/19]] | {{ :mds:lbi:lbi.09.mdxpractice.mdx.zip |}} | | |22.|06.12 11:00-13:00 | Reporting with Power BI Desktop. Data Mining pre-processing in WEKA. | {{ :mds:lbi:lds.12.powerbi.pdf |}} {{ :mds:lbi:lds.13.weka.pdf |}}| [[https://kdd.isti.cnr.it/sites/default/files/video/2019-12-6.flv|Video 6/12/19]]| {{ :mds:lbi:weka.3.7.9.light.zip |}}{{ :mds:lbi:wekapatch.zip |}} {{ :mds:lbi:exercises-2midterm.txt.zip |}}| |23.|10.12 11:00-13:00 | API WEKA |{{ :mds:lbi:lds.15.wekaapi.pdf |}} {{ :mds:lbi:lsd.practice.ee-2019.pdf|}} | [[http://lds.di.unipi.it/apa/video/2019-Video/2019-12-10.flv|Video 10/12/2019]]| {{:mds:lbi:ee_dataset.zip | training set for exercise on Weka}} {{ :mds:lbi:ee_validation.arff.zip | validation set for exercise on Weka }} {{:mds:lbi:wekaapi-example.zip | Python example for WEKA API}}| |24.|12.12 11:00-13:00 | Practice for the second midterm| | | {{ :mds:lbi:ex-mdx.pdf | Queries to solve with MDX (this file is a more complete version of that one published the last lecture) }} {{ :mds:lbi:20140205.pdf | Exercise on MDX}} {{ :mds:lbi:2019-exercises-2midterm.zip |Solution Ex.}}| | ====== Exams ====== ===== Mid-term exams ===== **Rule: ** Students may do the second mid-term even if they did have the first mid-term. ^ Date ^ Hour ^ Room^ Notes ^ Marks ^ |5/11/2019 | 14:00 | H | | | |17/12/2019 | 14:00 | M | | | ===== Exam sessions ===== **Rule:** Students having at least one mid-term exam may do only one part of the written exam in the exam sessions. ^ Session ^ Date ^ Time ^ Room ^ Notes ^ Marks ^ =====Extra sessions A.A. 2018/19===== ^ Date ^ Time ^ Room ^ Notes ^ Results ^ |5/11/2019 | 14:00 | H | | | =====Past Editions ===== * [[LDS 2019-2020]] * [[LDS 2018-2019]] * [[LBI 2017-2018]]