Assignment 3: Predicting syntactic function of a word from its morphological features

Deadline: June 26, 10:00 CEST.

In this exercise, you will predict the syntactic function of a word, as specified in a dependency treebank, from its morphological features. This is a toy exercise to get you started with neural networks. However, it is also a part of what a real dependency parser does.

For this exercise, we will use data from the Universal Dependencies treebanks.

Universal Dependencies (UD) treebanks include morphological and syntactic annotations, and all treebanks are uniformly represented in CoNLL-U format.

For this set of exercises, you are strongly recommended to use Python Keras library. If you have a strong preference towards another environment please contact the tutors or the instructor.

All your code should be in the provided template file predict_dependency.py, and follow the instructions provided below and in the template file.

The exercise

3.1 Encoding data

Write a Python class Encoder for encoding input and output of your network. Implement the class using the interface defined as follows.

The fit() method of your class should take a CoNLL-U file, and perform necessary bookkeeping such that the transform() method described below works as intended.

The transform() method takes a CoNLL-U file as input, and returns two variables: features that contains multi-hot encoded morphological features (FEATS, column 6 of the CoNLL-U file) and the universal POS tag field (UPOS, column 4), and labels, which is the one-hot encoded version of the dependency label (column 8). The features should be returned as an $n \times m$ numpy array, and labels an $n \times k$ numpy array, where $n$ is the number of tokens in the CoNLL-U file, $m$ is the number of unique features–value pairs (including POS tags. You should consider pos=NOUN, pos=VERB, etc. as additional feature-value pairs) in the CoNLL-U file passed to the fit() method, and $k$ is the number of unique dependency labels in the CoNLL-U file passed to the fit() method.

You are free to adopt your earlier implementation (from assignment 2), or use any appropriate library function.

3.2 Define a feed-forward network

Implement the function build_model() that constructs and returns a feed-forward network with the required arguments input_len and output_len and the other optional arguments defined in the template file.

3.3 Train the model

Implement the function train() in the template, given a Keras model (returned by build_model() above), the encoded input and output for the training and the development sets, following the training/development split of the input treebank. Your function should train the model on the designated training data until the max_epoch is reached or the development set loss stops improving.

You are encouraged to experiment with various network parameters, and/or modify the default values to optimum ones for the treebank(s) you are working on. However, tuning of your model is not required for this assignment.

3.4 Print evaluation scores

Implement the function print_scores() in the template, which takes a (trained) Keras model (returned by build_model() above) and a CoNLL-U file as arguments, and prints out the following evaluation information.

Macro averaged precision, recall, F1 score and accuracy of the model
The confusion matrix
Accuracy of the two-best predictions of the network, such that the models output is accepted as correct if the gold value was the model’s first or second best prediction

A shortened example output is given below.

Precision: 0.35, Recall: 0.37, F1-score: 0.35, Accuracy: 0.76

Confusion matrix:
           acl advc advm amod appo
acl          0    0    5    0    0
advcl        0    0    3    0    0
advmod       0    0 1318    3    0
amod         0    0   41  756    0
appos        0    0    7    1    4

Accuracy two-best: 0.78

3.5 Train and test your model on a subset of UD treebanks

Run your program on any 10 UD treebanks of your choice. Write the output of your script (macro-averaged precision, recall, F1-score and accuracy) in a text file with name results.txt in a format similar to the following

UD_Basque-BDT    Precision: 0.4277, Recall: 0.4561, F1 score: 0.4334, Accuracy: 0.7234
UD_Chinese-GSD   Precision: 0.3117, Recall: 0.3520, F1 score: 0.2992, Accuracy: 0.5147
UD_English-EWT   Precision: 0.3628, Recall: 0.3973, F1 score: 0.3604, Accuracy: 0.6373
UD_Finnish-FTB   Precision: 0.5953, Recall: 0.5013, F1 score: 0.4973, Accuracy: 0.7758
UD_German-GSD    Precision: 0.4307, Recall: 0.4376, F1 score: 0.4069, Accuracy: 0.7592
UD_Hebrew-HTB    Precision: 0.4563, Recall: 0.4916, F1 score: 0.4421, Accuracy: 0.6750
UD_Latin-ITTB    Precision: 0.4885, Recall: 0.4867, F1 score: 0.4520, Accuracy: 0.7551
UD_Serbian-SET   Precision: 0.4958, Recall: 0.4913, F1 score: 0.4657, Accuracy: 0.7379
UD_Turkish-IMST  Precision: 0.4810, Recall: 0.4813, F1 score: 0.4464, Accuracy: 0.6600
UD_Wolof-WTB     Precision: 0.4450, Recall: 0.4433, F1 score: 0.4206, Accuracy: 0.6638

The results are from a “vanilla” network with no tuning. they are expected to change on each run, but still be indicative of the expected lower bound of on the scores.

Do not forget to add the results file to your repository. Note that some treebanks do not have train/dev/test split. You should chose ones that include all three sets. You are encouraged to pick treebanks of languages with different morphological typology.