I recently ran into a situation where I wanted to parse the content out of a .MSG file. As you may or may not know, a .MSG file is a compound file. As such, it contains a bunch of streams mostly in either ASCII or Binary format. Anyhow, there is a property that is called PR_RTF_COMPRESSED that usually contains a compressed (very rarely it will be uncompressed, never seen it though) version of the RTF that comprises a .MSG file. If you wanted to create a .MSG viewer of some sort you would need something to pull out the nicely formatted RTF.
When you create a email in Outlook (using 2003 for this), there is a ComboBox that specifies what type of format the email will be:
This will be set to a default of HTML. Let's set this to Rich Text for now. If I send this message to someone and then drag out the email from Outlook into Windows Explorer and then run the RtfDecompressor on it I will the RTF for it. I can then bind then RTF to a RichTextBox for a preview of what the formatted and colorized email content looked like (sample application):
Unfortunately, due to the .NET implementation of the RichTextBox, links are not displayed all that well and have some problems keeping the line breaks that separate them. Not much you can do about this (it works perfect in Microsoft Word and Word Pad).
To use this class, you must have a basic understanding of how compound files work. I will be eventually be posting another thread that shows how you can access the different streams of a compound file. Specifically the .MSG file. Anyway, the use is simple:
Code:
Dim compressedRtf = Me.GetCompressedRtf()
'//accepts either a System.Byte() or a
'// System.IO.Stream
Dim decompressedRtf = RtfDecompressor.Decompress(compressedRtf)
Dim text = Encoding.ASCII.GetString(decompressedRtf)
MessageBox.Show(text)
In the algorithm there is a section on CRC checking. You can call Decompress and specify whether it should enforce a CRC check. The default implementation does not force a CRC check.
Keep in mind this will work for all message formats, however, depending on your selection you will get very different decompressed RTF. If you select HTML, the decompressed RTF will basically contain pure HTML embedded in RTF. The Rich Text option will be the closest to what you see in Outlook.
Applications
Microsoft Exchange Server 2003
Microsoft Exchange Server 2007
Microsoft Exchange Server 2010
Microsoft Office Outlook 2003
Microsoft Office Outlook 2007
Microsoft Office Outlook 2010
Plans for this thread
Implement the Compression Algorithm
Show how you can expose the streams of a .MSG so that you can access the compressed RTF stream