August 16, 2018

Late Summer School 'Machine Learning for Language Analysis'

I’m happy to announce that I’ll be giving a two-day class on machine learning for reflected text analysis during the late summer school in Cologne, Germany.

The class takes place on September 26 and 27, and its main goal is to convey a basic understanding of how machine learning algorithms work concretely. The class will include both a theoretical introduction into (some) algorithms as well as a hands-on session in the form of a small shared task using Python. Application deadline is on August 20.

The hands-on session in the class will be supported by Nathalie Wiedmer.

TOC: Announcement, Preparations, Agenda, Material


The theoretical basics of machine learning methods are presented in a mixture of hackaton and tutorial, including an example implementation in Python and the concrete evaluation of text-analytical methods within the framework of a small shared task.


Participants are asked to install the following things on their computers


  • Python: If your computer already has Python 2, there is no need to update. If you’re installing Python from scratch, please use Python 3.
  • pip: The Python package manager
  • The Python libraries nltk and requests.

Detailed instructions for Windows, Mac OS X and Linux can be found here (PDF file). The file can be used to test the installation.

Text Editor

For editing Python files, participants will need a plain text editor. We recommend the following:


Time Wednesday, September 26 Thursday, September 27
09:00 Introduction,
machine learning basics
Hands on (continued)
10:30 coffee break coffee break
11:00 Machine learning algorithms Shared task evaluation
12:30 lunch break lunch break
14:00 Shared task introduction What to do next
15:30 coffee break coffee break
16:00 Hands on Closing discussion
17:00 closing closing


Hackatorial Package

Please download this zip file and extract it into a directory on your drive. The zip file contains

  • Data with annotated entity references (sub directory data)
  • Code for training, testing and uploading (sub directory code)
  • Resources used for feature extraction (sub directory static)

We will go over all these things in the shared task introduction.