VOCABULARY AND DICTIONARIES
UZEI is a pioneer in the development and processing of general and specialized lexicons. Thanks to its long history in this field, it offers a variety of services and options.
You will find these options in the blocks below, together with the NLP resources and tools used by UZEI.
General Vocabulary
UZEI has broad experience in creating dictionaries.
General vocabulary dictionaries
Besides the work it does for Euskaltzaindia, the Royal Academy of the Basque Language, (the Unified Dictionary and the Lexical Observatory, among others), UZEI has drawn up several complementary dictionaries, which it offers online or by means of plugins: the Basque Frequency Dictionary, Atzekoz aurrera – Dictionary of Terminations in Basque and the Dictionary of Synonyms.
Similarly, UZEI has developed and continues to develop projects to adapt the dictionaries of other organizations and authors to standardized formats, and, in particular, it has done work in XML based on TEI.
The UZEI tools (Euskera lexical database) and the tools for correcting and verifying the lexicon (Hobelex, IDITE) are some of the basic tools for carrying out these tasks.
Terminology
UZEI has drawn up a large number of terminology dictionaries throughout its history (which make up the core of EUSKALTERM), and is a pioneer and specialist in compiling and drawing up specialist vocabulary in all fields (Kontabilitatea eta Auditoretza Hiztegia ‘Dictionary of Accounting and Auditing’, MZT Hiztegia ‘Dictionary of Materials Science and Technology’, Farmazia Hiztegia ‘Pharmacy dictionary’, Proiektu Zuzendaritza Lexikoa ‘Lexicon of Project Management’, COVID-19aren hiztegi eleaniztuna ‘Multilingual Dictionary of covid-19’…).
In addition, in collaboration with Euskaltzaindia, UZEI has developed basic lexics: (Koronabirusaren oinarrizko lexikoa ‘Basic vocabulary of coronavirus’, Telelanaren oinarrizko lexikoa ‘Basic lexicon of teleworking’, Klima-aldaketaren oinarrizko lexikoa ‘Basic Lexicon of Climate Change’, Zibersegurtasunaren oinarrizko lexikoa ‘Basic lexicon of cybersecurity’…).
The technologies developed by UZEI, tools to help in terminology detection and extraction (Termigai, LEX2, Koloka), updated terminological criteria database (UTH), lexical correction and verification tools (IDITE) have contributed to all these tasks; the tools used by UZEI have been created by the company itself.
See the map Terminology in Europe
Vocabulary Correction and Verification
UZEI has developed the lexical checker IDITE for correcting and checking vocabulary.
This tool allows you to write and correct texts in Basque in accordance with the regulations and recommendations of standard vocabulary, and bring texts written some time ago into line with present-day Basque.
Large quantities of text can be analyzed, and, apart from correcting typical spelling corrector errors, it makes use proposals based on recommendations by regulatory authorities (Euskaltzaindia, Academy of the Basque Language and the Terminology Committee).
Classifying Documents
Once the words making up the text have been lemmatized and extracted, a classifier can automatically determine the subject matter of said text or document.
To this end, an extensive dictionary or database of terms classified and labelled according to subject matter is indispensable, with which to compare the words extracted from a given text.
UZEI has developed a tool capable of classifying texts automatically: The text classifier Gaika.