MT SUMMIT 2013 PANACEA TUTORIAL
NEW! The tutorial presentation is now available.
DESCRIPTION:
Although MT technologies may consist of language independent engines, they highly depend on the availability of language-dependent knowledge for their real-life implementation, i.e., they require Language Resources (LRs). In order to equip MT for every pair of European languages, for every domain, and for every text genre, appropriate LRs covering every language, domain and genre must be produced. Moreover, a Language Resource for a given language can never be considered complete or final. Language change and new knowledge domains emerge at rapid pace. Traditionally, LRs production is done by hand, and its high cost (highly skilled human work and development time) hindered full coverage.
PANACEA project has focused on the development of a factory of LRs that automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems, and by other based on Language Technologies (LT) applications. This automation is meant to cut down costs significantly, in terms of time and human effort. Such reductions are the only way to guarantee a continuous supply of LRs that MT and other Language Technologies may demand in a multilingual Europe. In order to address this objective, PANACEA has worked in (i) the development of a platform, designed as a dedicated factory for the composition of a number of LRs production lines based on combinations of different web services and (ii) the integration of advanced components for the acquisition and normalization of corpora, monolingual and parallel corpora, their alignment; the derivation of bilingual dictionaries out of aligned corpora; and the production of monolingual rich information lexica using corpus based automatic methods.
The aim of this tutorial is to introduce the audience to the PANACEA platform in particular, and workflow oriented tasks based on available web services for the production of LR for MT in general. The PANACEA platform is an interoperability space designed to help users access remotely deployed tools. Different service providers (institutions, universities, companies, etc.) have NLP tools that have been deployed as web services thanks to the infrastructure (platform) provided by the PANACEA project. These services can be freely accessed by users willing to test those tools or process their data. Web services can be combined to create complex chains called workflows which can be run and designed using Taverna (Java workbench).
The tutorial will also show how to become a service provider following the PANACEA guidelines for those interested on sharing their tools as web services (free or authenticated services).
Outline
Introduction to PANACEA Web Sites
PANACEA Web Sites from IULA UPF on Vimeo.
We recommend you to watch the videos in High Definition and in full screen mode.