seventh framework programmecordis.europa.eu
  • Castellano
  • Français
  • English
  • Deutsch
  • Italiano
  • Ελληνικά

MT SUMMIT 2013 PANACEA TUTORIAL

An architecture based on web services for the rapid deployment of workflows for Machine Translation
>>4 hour tutorial by Marc Poch (Universitat Pompeu Fabra) and Antonio Toral (Dublin City University)<<
www.mtsummit2013.infos

NEW! The tutorial presentation is now available.

DESCRIPTION:

Although MT technologies may consist of language independent engines, they highly depend on the availability of language-dependent knowledge for their real-life implementation, i.e., they require Language Resources (LRs). In order to equip MT for every pair of European languages, for every domain, and for every text genre, appropriate LRs covering every language, domain and genre must be produced. Moreover, a Language Resource for a given language can never be considered complete or final. Language change and new knowledge domains emerge at rapid pace. Traditionally, LRs production is done by hand, and its high cost (highly skilled human work and development time) hindered full coverage.

PANACEA project has focused on the development of a factory of LRs that automates the stages involved in the acquisition, production, updating and maintenance of LRs required by MT systems, and by other based on Language Technologies (LT) applications. This automation is meant to cut down costs significantly, in terms of time and human effort. Such reductions are the only way to guarantee a continuous supply of LRs that MT and other Language Technologies may demand in a multilingual Europe. In order to address this objective, PANACEA has worked in (i) the development of a platform, designed as a dedicated factory for the composition of a number of LRs production lines based on combinations of different web services and (ii) the integration of advanced components for the acquisition and normalization of corpora, monolingual and parallel corpora, their alignment; the derivation of bilingual dictionaries out of aligned corpora; and the production of monolingual rich information lexica using corpus based automatic methods.

The aim of this tutorial is to introduce the audience to the PANACEA platform in particular, and workflow oriented tasks based on available web services for the production of LR for MT in general. The PANACEA platform is an interoperability space designed to help users access remotely deployed tools. Different service providers (institutions, universities, companies, etc.) have NLP tools that have been deployed as web services thanks to the infrastructure (platform) provided by the PANACEA project. These services can be freely accessed by users willing to test those tools or process their data. Web services can be combined to create complex chains called workflows which can be run and designed using Taverna (Java workbench).

The tutorial will also show how to become a service provider following the PANACEA guidelines for those interested on sharing their tools as web services (free or authenticated services).


Outline
  • The PANACEA platform: an introduction to the PANACEA platform and its web portals.
  • The PANACEA Registry: the portal where all the deployed web services can be found. There are more than 150 services.
  • Running a web service from Spinet: The first and easiest way to call a PANACEA service. Use a simple web form to see how tools can be run.
  • Taverna: running a workflow. Show the users how to run workflows in Taverna.
  • The PANACEA myExperiment portal. Find more than 70 workflows ready to use, modify or to take as examples to design new ones.
  • From noisy crawled parallel data to a translation memory in TMX. A concrete example of a workflow involving bilingual crawling and alignment.


  • panacea_registry panacea_myexperiment panacea_tutorials panacea_documentation

    Introduction to PANACEA Web Sites

    PANACEA Web Sites from IULA UPF on Vimeo.

    We recommend you to watch the videos in High Definition and in full screen mode.