SemEval 2010

Latest news

Important dates

General task description





  • NEW!! January 24, 2011: LDC has released the SemEval OntoNotes English (catalog number LDC2011T01).
  • September 27, 2010: All the task datasets (except for English, which will be released early 2011) are now publicly available from the Download area.
  • August 19, 2010: Task datasets will be released shortly.
  • August 19, 2010: Scorer version 1.04 is available.
  • June 16, 2010: System results and outputs are available.
  • June 16, 2010: The task description paper that will be presented at SemEval-2010, July 15, is available.
  • April 3, 2010: Submission is closed.
  • March 30, 2010: Please make sure to upload the output files of your system according to the instructions specified in the README file of the test distribution. This same file describes the different evaluation scenarios (gold versus regular, open versus closed).
  • March 20, 2010: Test data are available from Submit your results before April 2!
  • March 2, 2010: A new version of the German corpus (completing the coreference annotation of all files) is now available for download from the same place. 
  • Feb 27, 2010: Latest scorer version (1.02) available for download. From now on, versions will be numbered.
  • Feb 26, 2010: A bug related to gold markables in the German corpus has been detected. Also, the new version of the scorer (1.02) fixes a bug affecting the B-cubed metric. The two updates will be available by March 1.
  • Feb 11, 2010: TRAINING DATA NOW AVAILABLE from
  • Feb 11, 2010: Test data elease on March 20, and the competition closes on April 2. 
  • Feb 10, 2010: Training data release on Feb 11. All are welcome to participate!
  • Feb 2, 2010: Training data release on Feb 10. 
  • Dec 26, 2009: Trial data v2 is available from the SemEval site. It includes data for Dutch, German and Italian, and updated versions for Catalan, English and Spanish.
  • Nov 11, 2009: Three more languages added to the task: Dutch, German and Italian. Trial data for them will be shortly released.
  • Oct 27, 2009: Change of the English corpora. Training and trial data will be from OntoNotes and ARRAU. Both gold standard and automatically annotated information will be provided. The new release will be shortly announced.
  • Sep 15, 2009: Trial data is available from the SemEval site.
  • Sep 10, 2009: Postponed - Trial data will be released in five days.
  • Sep 8, 2009: Scorers available from Download area.
  • Aug 28, 2009: Trial data will be released Sep 8, 2009.
  • Aug 11, 2009: Join our mailing list to stay up to date!
  • Aug 11, 2009: Use the forum to contact the organizers and/or participants and leave feedback.
  • June 4, 2009: Poster presented at SEW-2009 (Workshop on Semantic Evaluations: Recent Achievements and Future Directions) - SemEval-2010 Task 1: Coreference Resolution in Multiple Languages
  • Nov 30, 2008: This new website has been posted. Welcome to the task!
  • Nov 5, 2008: Changes in the web




  • Training data release: February 11
  • Test data release: March 20
  • Time constraint: Upload the results no later than 7 days after downloading the test set
  • Closing competition: April 2
  • System description papers: April 17
  • Notification of acceptance: May 6
  • Camera-ready papers: May 16





Using coreference information has been shown to be beneficial in a number of NLP applications including Information Extraction, Text Summarization, Question Answering and Machine Translation. This task is concerned with automatic coreference resolution for six different languages: Catalan, Dutch, English, German, Italian and Spanish. Two tasks are proposed for each of the languages: 

  • Full task. Detection of full coreference chains, composed of named entities, pronouns, and full noun phrases.
  • CANCELLED Subtask. Pronominal resolution, i.e., finding the antecedents of the pronouns in the text. 

In particular, we aim:

(i) To study the portability of coreference resolution systems across languages (Catalan, Dutch, English, German, Italian, Spanish)

  • To what extent is it possible to implement a general system that is portable to the three languages?
  • How much language-specific tuning is necessary?
  • Are there significant differences between Germanic and Romance languages? And between languages of the same family?

(ii) To compare four different evaluation metrics (MUC, B-CUBED, CEAF and BLANC) for coreference resolution.

  • Do all evaluation metrics provide the same ranking? Is there one that provides a more accurate picture of a system's accuracy?
  • Is there a strong correlation between them? 
  • Can statistical systems be optimized under all four metrics at the same time?

Although we target at general systems addressing the full multilingual task, we will allow taking part in any full/sub-task of any language.


For further details see the sections:

Task Description

Datasets and Formats


[Back to the top]



  • Véronique Hoste (Hogeschool Gent)
  • Lluís Màrquez (TALP, Universitat Politècnica de Catalunya)
  • M. Antònia Martí (CLiC, University of Barcelona)
  • Massimo Poesio (University of Essex / Università di Trento)
  • Marta Recasens (CLiC, University of Barcelona)
  • Emili Sapena (TALP, Universitat Politècnica de Catalunya)
  • Mariona Taulé (CLiC, University of Barcelona)
  • Yannick Versley (Universität Tübingen)
  • Other people behind the preparation of the corpora: Manuel Bertran (UB), Oriol Borrega (UB), Francesca Delogu (U.Trento), Jesús Giménez (UPC), Eduard Hovy (ISI-USC), Richard Johansson (U.Trento), Xavier Lluís (UPC), Montse Nofre (UB), Lluís Padró (UPC), Kepa Rodríguez (U.Trento), Mihai Surdeanu (U.Stanford), Olga Uryupina (U.Trento), Lente Van Leuven (UB) and Rita Zaragoza (UB).

 [Back to the top]


For queries, feedback or more information, feel free to post in the forum.