Material
Slides
Assessment (ideas for the final work: max. 2 pages + annexes)
-
Choose a list of words and search for their similar ones in a corpus (e.g. a book or several books). Use the pipeline learnt in the course and explain the procedure.
-
Create a dictionary of frequencies from a corpus by removing stopwords.
-
Do a quantitative analysis of a text or corpus: select the most frequent words and lemmas, count the number of adjectives, nouns, verbs, adverbs, etc.
-
Create a terminology of multi-word expressions from a corpus in a specific domain (sports, politics, biology, etc).
-
Create a glossary of just place names from a text or corpus.
-
Modify and improve the polarity lexicon of Linguakit and compare the results obtained with the sentiment analysis module before and after that modification.
-
...