Nisus Blog

Pronounced "Nice-us"


Alphabet Soup

December 15th, 2006 · No Comments

Essentially the only thing a computer knows anything about is numbers. So when it comes time to process text you have to assign each character a particular number. These matchings form a map, which is called a character set or text encoding. Back in Nisus Writer Classic many text encodings were used. You would use one encoding for English text, another encoding for Hebrew, another for Japanese, and so on.

The problem with this approach is that to properly understand a sequence of characters a program needs to know what text encoding was used. This assumes that all programs know all text encodings. If the encoding was unknown your text was as good as gibberish. To make matters worse some fonts had custom encodings. If you lost that font your text was again foobared.

Luckily those days are over thanks to Unicode, a character set designed to include every single character known to humans. Now your text, be it Arabic, Tibetan, or Thai, only requires one standard encoding that all modern software understands. And thanks to the foresight of the Unicode Consortium we even have an official numeric assignment for the snowman character.

Happy Holidays!

Tags: Martin

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment