Terminology Extraction (TE) consists of identifying term candidates (TC) in a specific text or collection of texts (corpus), as well as relevant information related to the concepts or usage of terms - such as the definition, context, etc. TE can be performed automatically by using Terminology Extraction Tools (TETs).
TE should not be confused with terminology identification, which refers to the recognition of new or existing terms by comparing the term candidate lists (output of TETs) with an existing terminology database to differentiate known from unknown terms.
TE is a quick way to acquire knowledge on a subject field and its specialised language. However, after term candidates have been extracted by tools, human specialists still have to decide whether results are appropriate or not. Performing TE regularly can also be a form of being up-to-date with recent terminological developments in a particular subject field (The Pavel Terminology Tutorial, 2006).
TE is considered part of the terminology work and represents an important task when creating terminology databases. It allows a quick identification of possible terms which can be retained by language specialists for further study, validation and integration into databases. Another advantage of performing automatic TE with TETs when creating termbases is that the results can be saved in different formats which, in turn, can be easily imported into other databases at a later stage.