This is a research internship topic that I have done in the first year of master, supervised by Tita Kyriacopoulou, Claude Martineau and Philippe Gambette. During this 3-month internship, I continued the development of an evaluation module for the Unitex/GramLab.

Numerous automatic language processing tasks, particularly information extraction for which the Unitex/GramLab software is used, are evaluated by comparison with manual labeling, using precision, recall and F-score. The development of a comparison module of two annotated text files, containing the same text but with different annotations (for example manually annotated for the first text, automatically for the second), would be interesting for Unitex/GramLab.

A first step in this direction is done with the release of free software Gemini, coded in Java. This tool calculates various metrics, whose meaning is detailed in the presentation of the tool on GitHub, to compare two annotated text files (in BRAT or XML format):

Gemini also allows you to explore the comparison result visually in a web page where the matched annotations are shown in two different colors:

It allows you to export a spreadsheet file in the end containing all pairs of annotation: