Inside of TrueType font
Most used today font type is the TrueType fonts. TrueType technology was invented by Apple in late 1980. Now TrueType standard dominates over all other font types.
Are there any reasons to know what is font?
I know at least a single reason – embedding fonts into documents. Some document’s formats have ability to include TrueType fonts into documents for keeping look and consistency on all platforms. For example, PDF document may have an embedded font. Moreover, XPS document always includes one or more fonts.
The embedding fonts does not require knowledge of internal font’s structure, until time when you want reduce document size. Embedding single font may dramatically increase document size. Moreover, documents which include many fonts and huge Unicode fonts may overblow its size. This is a point to the need of the knowledge of an internal font structure.
What is TrueType file? This is a set of binary tables with vector’s graphics and font’s properties. Due to vector’s nature of the fonts, they can be easy scalable on various graphic devices.
Let see into Arial font of Windows 7. There are following tables inside it:
As you can see, "glyf" table is a biggest table of font. This table consists of outlines and hinting definitions of all characters, number, signs, and eastern glyphs that defined in the font. The outlines of the characters are made of straight line segments and quadratic Bezier curves. Each glyph outlines followed by the hinting data. The hinting data is used for improving a visual appearance on devices with low DPI. The idea of fonts packing based upon removing unused glyphs from the “glyf” table.
The problem is that each glyph's record has its own size. In order to access a glyph's data an application must look in additional table - "loca". The "loca" table format is fairly simple – count of records match count of glyphs defined in the font. Each record is a glyph’s position of glyph's data within the "glyf" table. Let’s look on it:
This table has additional feature – it is used to find of glyph data size. Subtraction position of next element from the position of current element gives the size of the glyph's data. For example, glyph on position 0 is 42 bytes length. If some glyph does not have outlines, then its ssize is zero. For example, glyphs at positions 1, 2, and 3 in table above do not have outlines information.
Different nations use different languages and alphabets. TrueType font can consist of different combinations of national alphabets. Procedure of getting font's outline from character code, uses an additional table - "cmap". All characters divided into segments, where each segment matches to one of alphabets that defined in font.
Let’s see on particular procedure for converting character’s code to outlines. First, we check "cmap" table and look for segment that matches the character code . Based on the segment’s data and the character code, simple formula give us index that points to record in the “loca” table. The “loca” record give us position and size of glyph data within the “glyf” table.
To reduce font size you should build dictionary of used characters and rebuild “glyf” and “loca” tables with help of that dictionary.
The glyph data may have no an outlines information, but may consist of several pointers to other glyphs. These glyphs called “the composed glyphs”. You should take this fact into consideration if you rebuild "loca" and "glyf" tables.
Therefore, three most important tables in TrueType fonts are “cmap”, “loca”, and “glyf”. What about other tables? All TrueType tables can be considered as necessary or an optional. We cannot remove the necessary tables, but can easily remove some optional tables. For example some fonts can have embedded bitmap data developed by artist for drawing on devices with low DPI. These tables can be safely removed from font with one side-effect – lost looking quality on low-resolution devices, like CRT or LCD monitors. However, you will never see differences of the packed fonts on modern printing devices (laser and inkjet printers).
One of important tables in the font is a “name” table. It stores various an user readable font properties. Here is example of some information of Arial’s name table.
The indexToLocFormat and glyphDataFormat fields of the “head” tables defines the format of “loca” and “glyf” tables.
CreatedDateTime and ModifiedDateTime fields defines number seconds since January 1 of 1904.
xMin, xMax, yMin, and yMax fields define bounds for any glyph in font.
unisPerEm is an important field that defines how many dots is a width of the Latin M character. This value used by the font rendering engine for scaling font outlines to a device bitmap.
Only single field is important for the font packing purposes – indexToLocFormat. This field defines a record size of “loca” table. Zero value defines record type of “loca” table as a 16-bits unsigned integer. Value 1 defines record type of “loca” table as a 32-bits unsigned integer.