Strumenti Utente

Strumenti Sito


mds:txa:start

Text Analytics A.Y. 2020/21

Teacher

Andrea Esuli (andrea.esuli@isti.cnr.it)

Office hours: by appointment, send email.

Schedule

Lectures will be given using Microsoft Teams. Join the Text Analytics Team here.

Lecture recording is available on Microsoft Teams for delayed viewing.

Day Hour Room
Wednesday 9-11 Text Analytics Team
Thursday 9-11 Text Analytics Team

Objectives

The course targets text analytics systems and applications to respond to business problems by discovering and presenting knowledge that is otherwise locked in textual form. The objective is to learn to recognize situations in which text analytics techniques can solve information processing needs, to identify the analytic task/process that best models the business problem, to select the most appropriate resources methods and tools, to collect text data and apply such methods to them. Several applications context will be presented: information extraction, sentiment analysis (what is the nature of commentary on an issue), spam and fake posts detection, quantification problems, summarization, etc.

  1. Disciplinary background: Natural Language Processing, Information Retrieval and Machine Learning
  2. Mathematical background: Probability, Statistics and Algebra
  3. Linguistic essentials: words, lemmas, morphology, PoS, syntax
  4. Basic text processing: regular expression, tokenisation
  5. Data collection: twitter API, scraping
  6. Basic modelling: collocations, language models
  7. Introduction to Machine Learning: theory and practical tips
  8. Libraries and tools: NLTK, Spacy, Keras, pytorch
  9. Classification/Clustering
  10. Sentiment Analysis/Opinion Mining
  11. Information Extraction/Relation Extraction/Entity Linking
  12. Transfer learning
  13. Quantification

Exam

Exam will consist in a project to be agreed with the teacher and an oral exam. The outcome of the project will be some code and a report of the activity (4-10 pages is the typical length range). Oral exam will consist in the presentation and discussion of the project.

Lecture Notes

Date Lecture Notes
2020/09/16 Introduction to the course 00_-_introduction_to_the_text_analytics_course.pdf 01_-_natural_language_and_text_analytics.pdf
2020/09/17 Introduction to probability 02_-_introduction_to_probability.pdf
2020/09/23 Setup of Python environment 03_-_introduction_to_python.pdf
2020/09/24 Introduction to Python 03_1_introduction_to_python.zip
2020/09/30 Probabilistic Language Models 04_-_probabilistic_language_models.pdf
2020/10/01 Probabilistic Language Models 04_1_probabilisticlanguagemodel.zip
2020/10/07 Text Indexing, Regular expressions 05_-_text_indexing.pdf 05.1_-_strings_regular_expressions_and_bs4.zip
2020/10/08 NLTK, Collocations 05.2_-_nltk.zip 05.3_-_collocations.zip
2020/10/14 NLP tools, Spacy, Text indexing, preprocessing 05.4_-_spacy_text_processing.ipynb.zip
2020/10/15 Vector space model, ML for text analytics 06_-_machine_learning_for_text_analytics.pdf
2020/10/21 Scikit learn, pipeline 06_1_classification_sklearn.zip
2020/10/22 Feature engineering 06_2_classification_feature_engineering.zip
2020/10/28 Experimental protocols, optimization 07_-_experiments.pdf 07_1_optimization_sklearn.zip
2020/10/29 Sequence labeling, information extraction 08_-_information_extraction.pdf
2020/11/04 Inception, spacy 08_1_spacy_ner_train.zip
2020/11/05 Data collection 09_-_data_collection.pdf 09_1_scraping.zip 09_2_data_from_twitter.zip
2020/11/11 Introduction to neural networks 10_-_a_primer_on_neural_networks.pdf 10.1_-_example_of_backpropagation.pdf
2020/11/12 From SVM to NN, deep learning 10_2_svm_to_nn.zip
2020/11/18 Convolutional and Recurrent networks, text generation 10_3_classification_cnnnet.zip 10_4_classification_lstmnet.zip 10_5_textgeneration.zip
2020/11/19 Word embeddings, neural language models 11_-_neural_language_models.pdf 11_1_wordembeddings.zip
2020/11/25 Document embeddings, the Transformer 11_2_documentembeddings.zip
2020/11/26 BERT fine-tuning 11_3_bert_finetune_binary.zip 11_4_bert_finetune_multiclass.zip 11_5_simpletransformers_finetune_binary.zip 11_6_simpletransformer_generation_and_representation.zip

Textbooks

  1. D. Jurafsky, J.H. Martin, Speech and Language Processing. 3nd edition, Prentice-Hall, 2018.
  2. B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.
  3. S. Bird, E. Klein, E. Loper. Natural Language Processing with Python.

Previous editions

mds/txa/start.txt · Ultima modifica: 19/11/2020 alle 11:18 (13 giorni fa) da Andrea Esuli