September 6, 2019

Reflected Text Analysis beyond Linguistics

Cologne From September 9 to 13, I will be giving a class on Reflected Text Analysis beyond Linguistics, as part of the DGfS-CL fall school 2019 at the IMS at Stuttgart University. The class is also part of the CRETA Coaching.

This post serves as course page, containing the material, agenda etc.

Agenda

Day 14:00-15:30   16:00-17:30
Monday Introduction, Overview, Annotation Annotation exercise, Inter-Annotator Agreement
Tuesday Machine learning overview and evaluation, algorithms Algorithms
Wednesday Introduction into shared task, hands on session Hands on session
Thursday excursion to the German Literature Archive, Marbach
(starting at 1pm!)
Friday Hands on session, shared task evaluation What to do next, closing discussion

Material

Participants are asked to install the following things on their computers (this can be done during the first day of the class)

Python

  • Python: If your computer already has Python 2, there is no need to update. If you’re installing Python from scratch, please use Python 3.
  • pip: The Python package manager
  • The Python libraries nltk and requests.

Detailed instructions for Windows, Mac OS X and Linux can be found here (PDF file). The file test_install.py can be used to test the installation.

Text Editor

For editing Python files, participants will need a plain text editor. We recommend the following:

Slides

Monday

Tuesday

Wednesday

Friday

Projects (for ECTS credit points)

If you’re interested in getting ECTS credit points for taking part in this class, you’ll need to conduct a small project, according to the following recipe (unless we agreed on a different plan):

  1. Pick a task (e.g., part of speech tagging)
  2. Pick a non-standard text that is not too long (e.g., a poem)
  3. Create a gold standard by applying the annotation guidelines for the task
  4. Apply an existing tool for the task
  5. Evaluate the tool against your annotations
  6. Either
    • Develop hypotheses for improving/adapting the tool or
    • Retrain the tool on existing training data and your own corpus
    • Re-evaluate it after adding your own data
  7. Write a brief report on this and send it to me

Your project should be finished (and the report sent to me) before October 14.