Skip to main content
ICT Skills 1

Do you know what character encoding is? (7/31)

Yes

Find out

For a computer to process and interpret data, the data must be encoded. Character encoding is a symbolic arrangement of a set of instructions or data in a computer program. It can also be defined as the mapping of a character set using another character set, for example the mapping of natural language characters (such as an alphabet ) with a set of something else, such as numbers or electrical pulses.

So, since computers cannot work directly with the characters used in human writing systems, each character is related to a binary value. This is possibly due to character encoding. As such, each character available in human writing has to be encoded to be able to be recognised by a computer. The characters are encoded differently on different operating system, for example Microsoft Windows and Apple Macintosh.

The relationship between a character and its encoding is made by assigning a numerical value called code point to each character. The number of available code points depends upon the number of available bits. An 8-bit coded character set can encode 256 characters.

A second set of issues involves fonts, which are used to display and print characters. Even though an application may be able to interpret a particular character encoding, it may not be able to display a given character if it cannot find a suitable font. 

Why is this information important for translators and translation teachers?
The two sets of issues described above imply that a file created by one person might be unintelligible to a recipient, even though he/she is able to read the language involved. As documents are sent between countries to be viewed and modified on different types of computers running different software, which are set up to work in different languages, problems occur. These problems affect language professionals in particular. A text which is predominantly or entirely in a language such as English, which uses the Latin alphabet without diacritics, is not free from character encoding issues. An English text dealing with a mathematical subject might contain Greek letters. Moreover, the UK pound sign (£) is not present in all encoding systems.

Next