seventh framework programmecordis.europa.eu
  • Castellano
  • Français
  • English
  • Deutsch
  • Italiano
  • Ελληνικά

PANACEA Environment Italian Monolingual Corpus

This data set is the Italian part of the second version of the monolingual corpus (MCv2) acquired in the framework of PANACEA, an EU-FP7 Funded Project under Grant Agreement 248064. The data set contains documents that were acquired from the web, were automatically detected to be in the Italian language and were automatically classified as relevant to the "ENVironment" (ENV) domain.

N-gram lists and dependency parsed versions of this corpus are also available.

Size information:

  • tokens: 40,044,852

  • Download location

    DISCLAIMER: The right to use the sentences contained in this data set has been granted by their copyright holders. This usage is exclusive for research purposes and no profit can be made out of it. We are grateful to all sources for their kind and generous contribution.

    For further information on these sources, please see: Acknowledgements

    This resource is distributed under the following licence: CC-BY-NC-SA