Assignment 3: Predicting syntactic function of a word from its morphological features
Deadline: June 26, 10:00 CEST.
In this exercise, you will predict the syntactic function of a word, as specified in a dependency treebank, from its morphological features. This is a toy exercise to get you started with neural networks. However, it is also a part of what a real dependency parser does.
For this exercise, we will use data from the Universal Dependencies treebanks.
Universal Dependencies (UD) treebanks include morphological and syntactic annotations, and all treebanks are uniformly represented in CoNLL-U format.
For this set of exercises, you are strongly recommended to use Python Keras library. If you have a strong preference towards another environment please contact the tutors or the instructor.
All your code should be in the provided template file predict_dependency.py, and follow the instructions provided below and in the template file.
The exercise
3.1 Encoding data
Write a Python class Encoder
for encoding input and output of your network.
Implement the class using the interface defined as follows.
The fit()
method of your class should take a CoNLL-U file,
and perform necessary bookkeeping
such that the transform()
method described below works as intended.
The transform()
method takes a CoNLL-U file as input,
and returns two variables:
features
that contains multi-hot encoded morphological features
(FEATS, column 6 of the CoNLL-U file)
and the universal POS tag field (UPOS, column 4),
and labels
, which is the one-hot encoded version
of the dependency label (column 8).
The features
should be returned as an numpy array,
and labels
an numpy array,
where is the number of tokens in the CoNLL-U file,
is the number of unique features–value pairs
(including POS tags.
You should consider pos=NOUN
, pos=VERB
, etc. as additional feature-value pairs)
in the CoNLL-U file passed to the fit()
method,
and is the number of unique dependency labels
in the CoNLL-U file passed to the fit()
method.
You are free to adopt your earlier implementation (from assignment 2), or use any appropriate library function.
3.2 Define a feed-forward network
Implement the function build_model()
that constructs
and returns a feed-forward network with the required arguments
input_len
and output_len
and the other optional arguments
defined in the template file.
3.3 Train the model
Implement the function train()
in the template,
given a Keras model
(returned by build_model()
above),
the encoded input and output for the training and the development sets,
following the training/development split of the input treebank.
Your function should train the model on the designated training data until
the max_epoch
is reached or the development set loss stops improving.
You are encouraged to experiment with various network parameters, and/or modify the default values to optimum ones for the treebank(s) you are working on. However, tuning of your model is not required for this assignment.
3.4 Print evaluation scores
Implement the function print_scores()
in the template,
which takes a (trained) Keras model (returned by build_model()
above)
and a CoNLL-U file as arguments,
and prints out the following evaluation information.
- Macro averaged precision, recall, F1 score and accuracy of the model
- The confusion matrix
- Accuracy of the two-best predictions of the network, such that the models output is accepted as correct if the gold value was the model’s first or second best prediction
A shortened example output is given below.
Precision: 0.35, Recall: 0.37, F1-score: 0.35, Accuracy: 0.76
Confusion matrix:
acl advc advm amod appo
acl 0 0 5 0 0
advcl 0 0 3 0 0
advmod 0 0 1318 3 0
amod 0 0 41 756 0
appos 0 0 7 1 4
Accuracy two-best: 0.78
3.5 Train and test your model on a subset of UD treebanks
Run your program on any 10 UD treebanks of your choice.
Write the output of your script (macro-averaged precision, recall, F1-score and accuracy)
in a text file with name results.txt
in a format similar to the following
UD_Basque-BDT Precision: 0.4277, Recall: 0.4561, F1 score: 0.4334, Accuracy: 0.7234
UD_Chinese-GSD Precision: 0.3117, Recall: 0.3520, F1 score: 0.2992, Accuracy: 0.5147
UD_English-EWT Precision: 0.3628, Recall: 0.3973, F1 score: 0.3604, Accuracy: 0.6373
UD_Finnish-FTB Precision: 0.5953, Recall: 0.5013, F1 score: 0.4973, Accuracy: 0.7758
UD_German-GSD Precision: 0.4307, Recall: 0.4376, F1 score: 0.4069, Accuracy: 0.7592
UD_Hebrew-HTB Precision: 0.4563, Recall: 0.4916, F1 score: 0.4421, Accuracy: 0.6750
UD_Latin-ITTB Precision: 0.4885, Recall: 0.4867, F1 score: 0.4520, Accuracy: 0.7551
UD_Serbian-SET Precision: 0.4958, Recall: 0.4913, F1 score: 0.4657, Accuracy: 0.7379
UD_Turkish-IMST Precision: 0.4810, Recall: 0.4813, F1 score: 0.4464, Accuracy: 0.6600
UD_Wolof-WTB Precision: 0.4450, Recall: 0.4433, F1 score: 0.4206, Accuracy: 0.6638
The results are from a “vanilla” network with no tuning. they are expected to change on each run, but still be indicative of the expected lower bound of on the scores.
Do not forget to add the results file to your repository. Note that some treebanks do not have train/dev/test split. You should chose ones that include all three sets. You are encouraged to pick treebanks of languages with different morphological typology.