Based on text corpora in Basque and Spanish, this tool was developed by UZEI to automatically identify and analyse the term candidates contained in those corpora.
Bases of the analysis
TermiGai identifies the term candidates from a sample text using a combination of linguistic and statistical methods. For this analysis it uses automatic lemmatizers, euLEMA and esLEMA, and the multiword lexical unit extractor, Koloka.
This tool can be used in a variety of ways, depending on the characteristics of each text or the results sought. Below are the main functionalities of TermiGai:
- TermiGai can carry out its analysis using only a general lexicon or the general lexicon together with the specialist one.
- In the case of multiword lexical units, the confidence level can be chosen. The higher the level selected, the greater the reliability of the candidates proposed by the TermiGai, even though it proposes a fewer number.
TermiGai is of great help in drawing up new terminology dictionaries or for updating terminology in a specific field, as it provides term candidates in Basque and Spanish by automatically processing large text masses.
The result of the analysis helps the terminologist in their study of large text corpora.