Basically it is simple... someone types something in and then you convert it into an image. Down the road you can add color, etc... The encoded values of the characters from the operating system will give you the correct glyph location in the font. However opening the font and rendering them into images is slightly more confusing.

Does this clarify it a bit more?