Strumenti Utente

Strumenti Sito


mds:txa:start

Differenze

Queste sono le differenze tra la revisione selezionata e la versione attuale della pagina.

Link a questa pagina di confronto

Entrambe le parti precedenti la revisioneRevisione precedente
Prossima revisione
Revisione precedente
mds:txa:start [24/09/2018 alle 16:27 (6 anni fa)] – [Lecture Notes] Giuseppe Attardimds:txa:start [08/08/2024 alle 12:41 (6 settimane fa)] (versione attuale) – versione precedente ripristinata (20/12/2022 alle 09:37 (20 mesi fa)) Salvatore Ruggieri
Linea 1: Linea 1:
-====== Text Analytics A.Y. 2018/19 ======+====== Text Analytics (635AA) A.Y. 2022/23 ======
  
-=== Teachers === 
  
-  * [[http://www.di.unipi.it/~attardi|Giuseppe Attardi]] +==== Teacher ==== 
-  [[http://www.esuli.it/|Andrea Esuli]]+ 
 +[[http://luciacpassaro.github.io/|Lucia Passaro]] (lucia.passaro [at] unipi [dot] it) 
 + 
 +Office hours: Monday 16-18 via [[https://teams.microsoft.com/l/chat/0/0?users=lucia.passaro@unipi.it|Teams]] 
 + 
 + 
 +==== Schedule ====
  
-^  Schedule  ^^^ 
 ^ Day ^ Hour ^ Room ^  ^ Day ^ Hour ^ Room ^ 
-| Monday | 11-13 X1, Polo Fibonacci +| Monday | 9-11 Fib M1 
-Tuesday9-11 | X1, Polo Fibonacci |+Friday| 11-13 Fib M1 |
  
-=== Forum === 
-Forum on [[https://piazza.com/class/jlm3aqcshik3sm|Piazza]] 
  
 +[[https://teams.microsoft.com/l/team/19%3au_2NWnfXHAGPknxec1GtEY5y8UrjGRSAQjuJ1tySJ7w1%40thread.tacv2/conversations?groupId=414d90af-3f1a-4188-9dd7-21ac607e5c1f&tenantId=c7456b31-a220-47f5-be52-473828670aa1|Team of the class]]
  
 ==== Objectives ==== ==== Objectives ====
-The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to themSeveral applications context will be presentedinformation extractionsentiment analysis (what is the nature of commentary on an issue), spam and fake posts detectionquantification problems, summarization, etc.+The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form.  
 +The main objectives of the course are: 
 +  - Learning essential techniques, algorithms, and models used in natural language processing. 
 +  - Understanding of the architectures of typical text analytics applications and of libraries for building them.  
 +  - Expertise in design, implementation, and evaluation of applications that exploit analysis, interpretation, and transformation of texts. 
 + 
 + 
 +==== Background ==== 
 + 
 +  * Background: Natural Language Processing, Information Retrieval and Machine Learning 
 +  * Mathematical background: Probability, Statistics and Algebra 
 +  * Linguistic essentials: words, lemmas, morphology, Part of Speech (PoS), syntax 
 +  * Basic text processing: regular expressiontokenisation 
 +  * Data collection: scraping 
 +  * Basic modelling: collocations, language models 
 +  * Introduction to Machine Learning: theory and practical tips 
 +  * Libraries and tools: NLTK, Spacy, Keras, pytorch 
 +  * Classification/Clustering 
 +  * Sentiment Analysis/Opinion Mining 
 +  * Information Extraction/Relation Extraction/Entity Linking 
 +  * Transfer learning 
 +  * Quantification 
 + 
 +==== Lecture Notes ==== 
 + 
 +^ Date ^ Lecture ^ Slides ^ Material / Reference ^ 
 +| 2022/09/16 | Introduction to the courseNLP & Text Analytics. | [[https://drive.google.com/file/d/1wc6yvn6Y5QrFXyFw53xeB4M6MsMmWssS/view?usp=sharing| 1 - Introduction to the Text Analytics course]]|J. Eisenstein. Introduction to Natural Language Processing. MIT Press.[[https://drive.google.com/file/d/17T4zo2uGssKBa_MrHsLW-uSmyP_ZJvpj/view?usp=sharing| Chp. 1]].| 
 +| 2022/09/19 | Reminds on Probability. Language and Probability. | [[https://drive.google.com/file/d/1-exk-JS0_Oa3Eg1ApTGlxonQlL3KbTQG/view?usp=sharing| 2 - Reminds on Probability]]| | 
 +| 2022/09/23 | Introduction to Python.| [[https://drive.google.com/file/d/1lpyA0N4K0d0ZTrJgokot1NwC_w4HG6gG/view?usp=sharing| 3 - Introduction to Python]]|[[https://drive.google.com/file/d/1BubwKtByCankjnbClWErvSsw9EjCLnte/view?usp=sharing|Introduction to Python - Notebook.]]| 
 +| 2022/09/30 | Introduction to Python (continued). Project Presentation and Important Dates. | [[https://drive.google.com/file/d/1FjCYvOkZDWomEsJuXD32Vl_155kxnKik/view?usp=sharing|Project and Dates]]| | 
 +| 2022/10/03 | Probabilistic Language Models. | [[https://drive.google.com/file/d/1B5HfPtPgK41Ig_NWrPim6YxK3mCF-XSj/view?usp=sharing| 5 - Probabilistic Language models]]|D. Jurafsky, J.H. Martin.[[https://drive.google.com/file/d/1OXSjwE0-ZN6DZ4MELOMp8JVy-tP2_4Iw/view?usp=sharing| Chp. 3]]. [[https://drive.google.com/file/d/1osuyJi5ZbBMghOrQz_IVqMsfxi2-1Vzj/view?usp=sharing| Probabilistic Language Models - Notebook]].| 
 +| 2022/10/07 | Text Indexding: Strings, Regular Expressions and BS4. | [[https://drive.google.com/file/d/1hkkjm5saUiKqL-9KgGgozTBupgIOus74/view?usp=sharing| 6 - Text Indexing-1]]|D. Jurafsky, J.H. Martin.[[https://drive.google.com/file/d/1RO_PGJj0a8v_N0dnw5iK4nGiabaAKZc5/view?usp=sharing| Chp. 2]]. [[https://drive.google.com/file/d/1IX8qSNdSbTFz5n1yMsMqtX9QU6HOofdv/view?usp=sharing| Strings, Regular Expressions and BS4 - Notebook]].| 
 +| 2022/10/10 | Text Indexding: Linguistic annotation. NLTK. | [[https://drive.google.com/file/d/11AjdH0K1W5OytgdofaxlCP_nQlI-rRTB/view?usp=sharing| 6 - Text Indexing-2]]|[[https://drive.google.com/file/d/1uigGIb0_9bX2Gb5g6SX51JHyN3kxN4y_/view?usp=sharing| Linguistic annotation with NLTK - Notebook]].| 
 +| 2022/10/14 | Text Indexding: Collocations with Gensim. stanza. spacy. Feature selection. | [[https://drive.google.com/file/d/13RDX2D2m8Bhkv0_qddvpWndBYQWoKYpY/view?usp=sharing| 6 - Text Indexing-3]]|[[https://drive.google.com/file/d/12L7nHe9TvZJPSS4RaiyGyIaPnx8cXkrN/view?usp=sharing| L6.3.4 - collocations - stanza - spacy - Notebooks]].| 
 +| 2022/10/17 | Text Indexding: Vector space models. | [[https://drive.google.com/file/d/1AhhYq-1mCGqtVcUnvoiAb7c2WYs4CSm2/view?usp=sharing| 6 - Text Indexing-4]]|D. Jurafsky, J.H. Martin.[[https://drive.google.com/file/d/1A1aKTIQh8CnEU8QBkmet1iADpTBAdUHR/view?usp=sharing| Chp. 6]]. [[https://drive.google.com/file/d/1dyn540ISuJ8wMlBkUoFH5J9ctNIcHj54/view?usp=sharing| L6.5 - Vector space model - toy example - Notebook]].| 
 +| 2022/10/21 | Machine Learning for Text Analytics. | [[https://drive.google.com/file/d/1eHQR4GhtPjgN7muIRLfyQBXcgM4oQmUK/view?usp=sharing| 10 - Machine Learning for Text Analytics]]| | 
 +| 2022/10/24 | Student project presentations: proposal, brainstorming, discussion. | |  
 +| 2022/10/28 | Student project presentations: proposal, brainstorming, discussion. | |   
 +| 2022/11/04 | Machine Learning for Text Analytics. Experiments and Practice. | [[https://drive.google.com/file/d/1HXC4pHde9D7bYYAw4vM6ihS2u7kChgO8/view?usp=share_link| 13 - Experiments]]| [[https://drive.google.com/file/d/1xbRmZ-HudXRIqBbQDpXjOlUNyrq7-yop/view?usp=share_link| Classification sklearn - Notebook.]]| 
 +| 2022/11/07 | Topic Modeling. | [[https://drive.google.com/file/d/1ytnJjLHtLT97gCNbzBp_I2TCcY7bfjMN/view?usp=share_link| 14 - Topic modeling]]| Zhai and Massung (2016) Text Data Management and Analysis. [[https://drive.google.com/file/d/1iJ71WZIpWP-cWxLtvsf5L4vp_epJH0uV/view?usp=share_link| Chp 17]].[[https://drive.google.com/file/d/1fKpyNYs9kNlPJpiYiDkzO_j6TkyHM8sS/view?usp=share_link| Topic Modeling - Notebooks.]]| 
 +| 2022/11/11 | A primer on Neural Networks. | [[https://drive.google.com/file/d/1_snMjfUb1z5YLBEHft6HJo65w4EWYD-v/view?usp=share_link|15 - A Primer on Neural Networks]]| |  
 +| 2022/11/14 | A primer on Neural Networks (continued). Practice.| | [[https://drive.google.com/file/d/1UEKJ_E1hD92E4OPw5HUOhvf1NKxxTR2T/view?usp=share_link| From SVM to NN, Classification with Keras - Notebooks.]]| 
 +| 2022/11/18 | Neural Language Models. Word2vec | [[https://drive.google.com/file/d/1Juf8aMqg_c5wW1KvQxfzvz2A6diV4a4A/view?usp=share_link| 17 - Neural Language Models-1]]|[[https://drive.google.com/file/d/1ffEsnsmb_o3iX9YBkS095UMPrMOMGrdO/view?usp=share_link|Word2vec with Gensim - Notebook.]]| 
 +| 2022/11/21 | Neural Language Models. Doc2vec. Transformer. BERT. | [[https://drive.google.com/file/d/10_VjJacKzajp7yNuSOhZo-nNUJjChkcN/view?usp=share_link| 18 - Neural Language Models-2]]|D. Jurafsky, J.H. Martin. Chps. [[https://drive.google.com/file/d/14oI6vsl4KCpGyamBbVjeYPTuzWOSNEtV/view?usp=share_link|7]] [[https://drive.google.com/file/d/1wonZ08i0etFhEMSjQEVU2vKf6UyUjEHb/view?usp=share_link|9]][[https://drive.google.com/file/d/1BsCfRzp3t6xAe4GfUTZbHfBujXyxtdAA/view?usp=share_link|11]].[[https://drive.google.com/file/d/1hs6ffqsn1gLM6RXSsFcTYjfh-AjFYXDu/view?usp=share_link|Doc2vec with Gensim - Notebook.]]| 
 +| 2022/11/25 | Seminar (Alessandro Bondielli). Evaluating strategies for Automatic Profiling of Résumés.| |[[https://drive.google.com/file/d/1dopSg44-kSGhIo3nv2wLFis7IRePp6xM/view?usp=share_link|A case study.]] | 
 +| 2022/12/02 | Student project presentations: ongoing experiments. Discussion. | |  
 +| 2022/12/05 | Student project presentations: ongoing experiments. Discussion. | |  
 +| 2022/12/09 | Fine-tuning BERT. Advanced applications (Conversational Agents, Affective Computing).| [[https://drive.google.com/file/d/1RdiNnhM5he2ZIfLBFZ-dPdhlDZMZEHjO/view?usp=share_link| 22 - Advanced applications]]| [[https://drive.google.com/file/d/1q7ZsRYoA4fL4e0VRytezq1-b6s-FpkJD/view?usp=share_link|BERT finetune - Notebooks]]. Recommended chapters: D. JurafskyJ.H. Martin.[[https://drive.google.com/file/d/1BWfVPq4HiTWzvUHGaqaEkJEWieQxhf-g/view?usp=share_link|20]];[[https://drive.google.com/file/d/148pdYBYtUCwCHR349-HDjEMJMS11ONDi/view?usp=share_link|24]].| 
 +  
 + 
 +==== Exam ==== 
 + 
 +** Attending students ** 
 + 
 +The exam for attending students will consist of the development of a project to be agreed upon with the teacher and an oral exam. The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). The oral exam will consist of the presentation and discussion of the project. 
 +Projects may be based on challenges proposed in either research forums ([[https://alt.qcri.org/semeval2020/|Semeval]], [[http://www.evalita.it/|Evalita]]or other platforms ([[https://kaggle.com|Kaggle]]). Students are also invited to propose a project based on other sources (e.g.recent papers on ArXiv [[https://arxiv.org/list/cs.CL/new|CL]] or [[https://arxiv.org/list/cs.AI/new|AI]]), or their own interests. Students may work in 3-5 people groups. 
 + 
 + 
 +** Non-Attending students **  
 + 
 +The exam for non attending students will consist in a written exam with open question and exercisesand an oral discussion on the topics of the course.
  
-  - Disciplinary backgroundNatural Language Processing, Information Retrieval and Machine Learning +Written test [[https://drive.google.com/file/d/1Q-NVz_x-UjllTG-CPAKGV4aKmK4Hz5af/view?usp=share_link|example]].
-  Mathematical background: Probability, Statistics and Algebra +
-  Linguistic essentials: words, lemmas, morphology, PoS, syntax  +
-  Basic text processing: regular expression, tokenisation +
-  - Data gathering: twitter API, scraping +
-  - Basic modelling: collocations, language models +
-  - Introduction to Machine Learning: theory and practical tips +
-  - Libraries and tools: NLTK, Keras +
-  - Applications: +
-    * Classification/Clustering +
-    * Sentiment Analysis/Opinion Mining +
-    * Information Extraction/Relation Extraction +
-    * Entity Linking +
-    * Spam Detection: mail spam & phishing, blog spam, review spam+
  
-==== Jupyter Notebook Server ==== 
-A server has been setup for running [[http://attardi-4.di.unipi.it:8000|Jupyter Notebooks]]. 
-In order to log into the server, you must get credentials for a Google Suite account:[[https://gsuite.signup.unipi.it|go to this page]] and register with your University credentials to activate your free account. 
  
-====== Lecture Notes ====== 
  
-^ Date ^ Lecture ^ Notes ^ +==== Textbooks ==== 
-| 17/9/2018 | Introduction | {{ :mds:txa:1-intro.pptx | Text Analytics}} | +It is recommended to read selected chapters from:
-| 18/9/2018 | Introduction to Probability | {{ :mds:txa:2-probability.pptx | Probability}} | +
-| 24/9/2018 | Language Modeling | {{ :mds:txa:3-languagemodeling.pdf | Language Modeling}} |+
  
-====== Textbooks ====== 
  
   - D. Jurafsky, J.H. Martin, [[https://web.stanford.edu/~jurafsky/slp3/|Speech and Language Processing]]. 3nd edition, Prentice-Hall, 2018.   - D. Jurafsky, J.H. Martin, [[https://web.stanford.edu/~jurafsky/slp3/|Speech and Language Processing]]. 3nd edition, Prentice-Hall, 2018.
-  - B. Liu, [[https://www.cs.uic.edu/~liub/FBS/SentimentAnalysis-and-OpinionMining.html|Sentiment Analysis and Opinion Mining]]. Morgan & Claypool Publishers, 2012. 
   - S. Bird, E. Klein, E. Loper. [[https://www.nltk.org/book/|Natural Language Processing with Python]].   - S. Bird, E. Klein, E. Loper. [[https://www.nltk.org/book/|Natural Language Processing with Python]].
  
-====== Edizioni Precedenti ======+Further bibliography will be indicated as a material for the single lessons. 
 +==== Previous editions ====
  
 +  * [[http://didawiki.cli.di.unipi.it/doku.php/mds/txa/start?rev=1649067582|2021-2022]]
 +  * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1612257498|2020-2021]]
 +  * [[https://elearning.di.unipi.it/course/view.php?id=162|2019-2020]]
 +  * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1551450538|2018-2019]]
   * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1515682954|2017-2018]]   * [[http://didawiki.di.unipi.it/doku.php/mds/txa/start?rev=1515682954|2017-2018]]
  
mds/txa/start.1537806434.txt.gz · Ultima modifica: 24/09/2018 alle 16:27 (6 anni fa) da Giuseppe Attardi

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki