August 16, 2018

Late Summer School 'Machine Learning for Language Analysis'

I’m happy to announce that I’ll be giving a two-day class on machine learning for reflected text analysis during the late summer school in Cologne, Germany.

The class takes place on September 26 and 27, and its main goal is to convey a basic understanding of how machine learning algorithms work concretely. The class will include both a theoretical introduction into (some) algorithms as well as a hands-on session in the form of a small shared task using Python. Application deadline is on August 20.

The hands-on session in the class will be supported by Nathalie Wiedmer.

TOC: Announcement, Preparations, Agenda, Material

Announcement

The theoretical basics of machine learning methods are presented in a mixture of hackaton and tutorial, including an example implementation in Python and the concrete evaluation of text-analytical methods within the framework of a small shared task.

Preparations

Participants are asked to install the following things on their computers

Python

Python: If your computer already has Python 2, there is no need to update. If you’re installing Python from scratch, please use Python 3.
pip: The Python package manager
The Python libraries nltk and requests.

Detailed instructions for Windows, Mac OS X and Linux can be found here (PDF file). The file test_install.py can be used to test the installation.

Text Editor

For editing Python files, participants will need a plain text editor. We recommend the following:

Windows: Notepad++
Mac OS X: TextMate

Agenda

Time	Wednesday, September 26	Thursday, September 27
09:00	Introduction, machine learning basics	Hands on (continued)
10:30	coffee break	coffee break
11:00	Machine learning algorithms	Shared task evaluation
12:30	lunch break	lunch break
14:00	Shared task introduction	What to do next
15:30	coffee break	coffee break
16:00	Hands on	Closing discussion
17:00	closing	closing

Material

Hackatorial Package

Please download this zip file and extract it into a directory on your drive. The zip file contains

Data with annotated entity references (sub directory data)
Code for training, testing and uploading (sub directory code)
Resources used for feature extraction (sub directory static)

We will go over all these things in the shared task introduction.

Slides

Introduction
Machine learning basics
Machine learning algorithms
Shared task introduction
Shared task evaluation
Results (the results have been saved and fixed on October 1st, 11am. Submissions are no longer possible)
Addon
What to do next