New Results from Nominal Classification Experiments
We have recently run some experiments in nominal classification using our new gold-standards in English and Spanish.
Here, we summarize the results obtained in the task of noun classification with these gold-standards. We carried out the experiments using Decision Trees (see Bel et al., 2010) and the Corpus Tècnic de l'IULA (Cabré et al., 2006) to extract noun occurrences. This corpus contains a collection of written texts from the fields of Law, Economy, Genomics, Medicine, and Environment, as well as a contrastive corpus from the newspapers. In our experiments, for Spanish we used the 21M tokens newspaper corpora and the Economy corpus (1 milion words) to study domain-specific and sparse data problems. For English, we used exts of different domains: Economy, Medicine, Computer science and Environmental issues, of about 3.2M tokens.
Results for English
Class |
Acc. % |
FP % |
FN % |
Abstract |
71.36 |
9.86 |
18.78 |
Human |
80.48 |
5.16 |
14.36 |
Non-deverbal eventive nouns |
80.24 |
1.80 |
17.96 |
Results for Spanish
Class |
General Corpus |
Economy Corpus |
||||
Acc. % | FP % | FN % | Acc. % | FP % | FN % | |
Human |
78.14 |
9.13 |
12.74 |
71.70 |
5.25 |
23.05 |
Semiotics |
71.46 |
7.94 |
20.60 |
65.54 |
10.46 |
24.00 |
Non-deverbal eventive nouns |
86.43 |
3.02 |
10.55 |
68.55 |
18.87 |
12.58 |