Laura Pollacci (laura.pollacci [at] di [dot] unipi [dot] it)
The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The main objectives of the course are:
|Date||Lecture||Slides||Material / Reference|
|2023/09/21||Introduction to the course, NLP & Text Analytics.||1 - Introduction to the Text Analytics course||J. Eisenstein. Introduction to Natural Language Processing. MIT Press. Chp. 1.|
|2023/09/22||Reminds on probability.||2 - Reminds on probability|
|2023/09/28||Introduction to Python.||3 - Introduction to Python||L3 - Introduction_to_Python.ipynb|
|2023/09/29||Introduction to Python - part 2. Project and Dates||4 - Project and Dates|
|2023/10/05||Probabilistic language models||5 - Probabilistic language models||D. Jurafsky, J.H. Martin. Ch3 L5 Probabilistic Language Model.ipynb|
|2023/10/06||Text Indexding: Strings, Regular Expressions and BS4.||6 - Text indexing 1||D. Jurafsky, J.H. Martin. Ch2 L6.1 - Strings Regular expressions and BS4.ipynb|
|2023/10/12||Linguistic annotation. NLTK.||6 - Text Indexing 2||L6.2 - Linguistic annotation with NLTK.ipynb|
|2023/10/13||Lesson canceled due to UNIPI orientation days.|
|2023/10/19||Feature Selection||6 - Text Indexing 3||L6.3 - Gensim collocations - Stanza - Spacy (Notebooks)|
|2023/10/20||Vector space models||6 - Text Indexing 4||D. Jurafsky, J.H. Martin. Chp. 6. L6.4 - Vector space model - toy example|
|2023/11/02||Machine Learning for Text Analytics.||10 - Machine Learning for Text Analytics - corrected|
|2023/11/03||Machine Learning for Text Analytics: Design Experimental Protocols. Student presentations: How to.||11 - Design Experimental Protocols. 11.1 - Student presentations: How to||L.11 - Classification with SkLearn|
|2023/11/09||Student project presentations: proposal, brainstorming, discussion.|
|2023/11/10||Student project presentations: proposal, brainstorming, discussion.|
|2023/11/16||Topic Modeling||12 - Topic Modeling||Zhai and Massung (2016) Text Data Management and Analysis. Chp 17. L.12 -Topic Modeling - Notebook.. L.12.1 - Topic Modeling pyLDAvis - Notebook|
|2023/11/17||A primer on Neural Networks||13 - A primer on Neural Networks|
|2023/11/23||Neural Networks||14 - Neural Networks||From SVM to NN, Classification with Keras - Notebooks.|
|2023/11/24||Neural Language Models||15 - Neural Language Models||D. Jurafsky, J.H. Martin. Chps. 7 9 11|
|2023/11/30||Student project presentations: ongoing experiments. Neural Language Models Practice||16 - Neural Language Models Word2Vec||Word2vec - Notebook.|
|2023/12/01||Student project presentations: ongoing experiments. Neural Language Models Practice||17 - Neural Language Models Doc2Vec||Doc2Vec - Notebook|
The exam for attending students will consist of the development of a project to be agreed upon with the teacher and an oral exam. The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). The oral exam will consist of the presentation and discussion of the project. Projects may be based on challenges proposed in either research forums (Semeval, Evalita) or other platforms (Kaggle). Students are also invited to propose a project based on other sources (e.g., recent papers on ArXiv CL or AI), or their own interests. Students may work in 3-5 people groups.
The exam for non attending students will consist in a written exam with open question and exercises, and an oral discussion on the topics of the course.
Written test example.
It is recommended to read selected chapters from:
Further bibliography will be indicated as a material for the single lessons.