seventh framework programmecordis.europa.eu
  • Castellano
  • Français
  • English
  • Deutsch
  • Italiano
  • Ελληνικά

WP4 reports on technologies and tools for corpus creation, normalization and annotation

04 Apr 2011
List of tools to integrate in PANACEA is defined while corresponding workflows are designed.
The overall objective of WP4 is the development of components for automatic acquisition, normalisation and automatic linguistic annotation of language resources. Integrated in the PANACEA Factory of Language Resources as web services, these components will provide the necessary annotated data for the technologies of WP5 Parallel Corpus and Derivatives and WP6 Lexical Acquisition.

To this end, in D4.1, we review several state-of-the-art, language-independent tools for acquisition and normalisation of web corpora. It also provides summaries of state-of-the-art NLP tools for each language targeted by the project. Moreover, this deliverable reports on the results of a survey on several aspects of NLP tools (efficiency, I/O formats, tagset semantics etc.) already available to PANACEA partners.

Based on these findings, we decided on the list of tools to integrate in the PANACEA factory, and we designed workflows for the production of annotated monolingual and bilingual resources.

For more information, please reference the full text pdf version of D4.1