Late Summer School 'Machine Learning for Language Analysis'
I’m happy to announce that I’ll be giving a two-day class on machine learning for reflected text analysis during the late summer school in Cologne, Germany.
The class takes place on September 26 and 27, and its main goal is to convey a basic understanding of how machine learning algorithms work concretely. The class will include both a theoretical introduction into (some) algorithms as well as a hands-on session in the form of a small shared task using Python. Application deadline is on August 20.
The hands-on session in the class will be supported by Nathalie Wiedmer.
TOC: Announcement, Preparations, Agenda, Material
Announcement
The theoretical basics of machine learning methods are presented in a mixture of hackaton and tutorial, including an example implementation in Python and the concrete evaluation of text-analytical methods within the framework of a small shared task.
Preparations
Participants are asked to install the following things on their computers
Python
- Python: If your computer already has Python 2, there is no need to update. If you’re installing Python from scratch, please use Python 3.
- pip: The Python package manager
- The Python libraries
nltk
andrequests
.
Detailed instructions for Windows, Mac OS X and Linux can be found here (PDF file). The file test_install.py can be used to test the installation.
Text Editor
For editing Python files, participants will need a plain text editor. We recommend the following:
Agenda
Time | Wednesday, September 26 | Thursday, September 27 |
---|---|---|
09:00 | Introduction, machine learning basics |
Hands on (continued) |
10:30 | coffee break | coffee break |
11:00 | Machine learning algorithms | Shared task evaluation |
12:30 | lunch break | lunch break |
14:00 | Shared task introduction | What to do next |
15:30 | coffee break | coffee break |
16:00 | Hands on | Closing discussion |
17:00 | closing | closing |
Material
Hackatorial Package
Please download this zip file and extract it into a directory on your drive. The zip file contains
- Data with annotated entity references (sub directory
data
) - Code for training, testing and uploading (sub directory
code
) - Resources used for feature extraction (sub directory
static
)
We will go over all these things in the shared task introduction.
Slides
- Introduction
- Machine learning basics
- Machine learning algorithms
- Shared task introduction
- Shared task evaluation
- Results (the results have been saved and fixed on October 1st, 11am. Submissions are no longer possible)
- Addon
- What to do next