WP4 reports on technologies and tools for corpus creation, normalization and annotation
To this end, in D4.1, we review several state-of-the-art, language-independent tools for acquisition and normalisation of web corpora. It also provides summaries of state-of-the-art NLP tools for each language targeted by the project. Moreover, this deliverable reports on the results of a survey on several aspects of NLP tools (efficiency, I/O formats, tagset semantics etc.) already available to PANACEA partners.
Based on these findings, we decided on the list of tools to integrate in the PANACEA factory, and we designed workflows for the production of annotated monolingual and bilingual resources.
For more information, please reference the full text pdf version of D4.1