I'm trying to convert a Outlook .msg file to pdf. For this, I 1st convert the .msg file to .html file and then export the .html file as .pdf file. It works perfect except for the images in the .msg file. All the images are shown with "x" mark in html.
Here is my code: I get the folder with .msg files, write all the file names in a text file. Each line of the text file is read, checked for attachments with in the .msg file, save only the .msg atachments in .html and then to .pdf and also save the main .msg file as .html and .pdf.
code:
Code:
Option Strict On
Imports Microsoft.Office.Interop
Imports System.IO
Imports Microsoft.Office.Interop.Word
Imports System.Windows.Forms.Application
Imports System.Text
Imports System.Drawing.Imaging
Public Class Form1
Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
Dim path As String = ""
Dim folderBrowserDialog1 As FolderBrowserDialog
folderBrowserDialog1 = New System.Windows.Forms.FolderBrowserDialog()
Dim resultOK As DialogResult = folderBrowserDialog1.ShowDialog
' get the selected path value
If resultOK = System.Windows.Forms.DialogResult.OK Then
path = folderBrowserDialog1.SelectedPath
path = path & "\"
End If
Dim di As New IO.DirectoryInfo(path)
Dim diar1 As IO.FileInfo() = di.GetFiles("*.msg", IO.SearchOption.AllDirectories)
Dim dra As IO.FileInfo
'list the names of all files in the specified directory
For Each dra In diar1
ListBox1.Items.Add(dra.FullName)
Next
Dim FileNumber As Integer = FreeFile()
FileOpen(FileNumber, path & "ListofExcelFiles.txt", OpenMode.Output)
For Each Item As Object In ListBox1.Items
PrintLine(FileNumber, Item.ToString)
Next
FileClose(FileNumber)
Dim FILE_NAME As String = path & "ListofExcelFiles.txt"
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objReader As New System.IO.StreamReader(FILE_NAME, System.Text.Encoding.UTF7)
Dim TextLine As String = ""
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
'***************************************************
Dim position As Integer = TextLine.LastIndexOf("\"c)
Dim getname As String
getname = TextLine.Substring(position + 1)
'MessageBox.Show(getname)
Dim extension = System.IO.Path.GetExtension(getname)
Dim result = getname.Substring(0, getname.Length - extension.Length)
'MessageBox.Show(result)
Dim newFolder As String = path & result & "\"
If IO.Directory.Exists(newFolder) Then
IO.Directory.Delete(newFolder, True)
IO.Directory.CreateDirectory(newFolder)
ElseIf Not IO.Directory.Exists(newFolder) Then
IO.Directory.CreateDirectory(newFolder)
End If
'Dim MSWordPageSetup As WdOrientation = WdOrientation.wdOrientLandscape
Dim MasterFileName As String = path & result
Dim MSWordExportFilePath As String = newFolder & result & ".pdf"
Dim MSWordExportFormat As WdExportFormat = WdExportFormat.wdExportFormatPDF
Dim MSWordOpenAfterExport As Boolean = False
Dim MSWordExportOptimizeFor As WdExportOptimizeFor = WdExportOptimizeFor.wdExportOptimizeForPrint
Dim MSWordExportRange As WdExportRange = WdExportRange.wdExportAllDocument
Dim MSWordStartPage As Int32 = 0
Dim MSWordEndPage As Int32 = 0
Dim MSWordExportItem As WdExportItem = WdExportItem.wdExportDocumentContent
Dim MSWordIncludeDocProps As Boolean = True
Dim MSWordKeepIRM As Boolean = True
Dim MSWordCreateBookmarks As WdExportCreateBookmarks = WdExportCreateBookmarks.wdExportCreateWordBookmarks
Dim MSWordDocStructureTags As Boolean = True
Dim MSWordBitmapMissingFonts As Boolean = True
Dim MSWordUseISO19005_1 As Boolean = False
TextBox1.Clear()
Dim oApp As New Outlook.Application
Dim OMItem As Outlook.MailItem
TextBox1.AppendText("Reading the .msg file" & vbNewLine)
OMItem = CType(oApp.CreateItemFromTemplate(MasterFileName & ".msg"), Outlook.MailItem)
Dim mailAttachments As Outlook.Attachments = OMItem.Attachments
Dim attachmentInfo As StringBuilder = New StringBuilder()
If Not IsNothing(mailAttachments) Then
For i As Integer = 1 To mailAttachments.Count
Dim currentAttachment As Outlook.Attachment = mailAttachments.Item(i)
If Not IsNothing(currentAttachment) Then
attachmentInfo.AppendFormat("#{0}", i)
attachmentInfo.AppendLine()
attachmentInfo.AppendFormat("File Name: {0}", currentAttachment.FileName)
attachmentInfo.AppendLine()
attachmentInfo.AppendFormat("Diplay Name: {0}", currentAttachment.DisplayName)
attachmentInfo.AppendLine()
attachmentInfo.AppendFormat("Type: {0}", currentAttachment.Type)
attachmentInfo.AppendLine()
attachmentInfo.AppendLine()
Dim attachmentName As String = currentAttachment.FileName
Dim counter As Integer = 0
Dim newFileName As String = attachmentName
While File.Exists(newFolder & newFileName)
counter = counter + 1
Dim positionAtt As Integer = newFileName.LastIndexOf("\"c)
Dim getAttname As String
getAttname = newFileName.Substring(positionAtt + 1)
Dim extensionAtt = System.IO.Path.GetExtension(getAttname)
Dim newFileNwmeWithoutExt = getAttname.Substring(0, getAttname.Length - extensionAtt.Length)
newFileName = String.Format("{0}({1})", newFileNwmeWithoutExt, counter.ToString())
newFileName = newFileName & extensionAtt
End While
If newFileName.Contains(".msg") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
Dim oApp1 As New Outlook.Application
Dim OMAItem As Outlook.MailItem
'//TextBox1.AppendText("Reading the .msg file" & vbNewLine)
Dim positionAtt As Integer = newFileName.LastIndexOf("\"c)
Dim getAttname As String
getAttname = newFileName.Substring(positionAtt + 1)
Dim extensionAtt = System.IO.Path.GetExtension(getAttname)
Dim currentAttachmentName = getAttname.Substring(0, getAttname.Length - extensionAtt.Length)
OMAItem = CType(oApp.CreateItemFromTemplate(newFolder & newFileName), Outlook.MailItem)
Dim name As String = newFolder & currentAttachmentName
Dim SW1 As StreamWriter = New StreamWriter(name & ".html")
SW1.Write(OMAItem.HTMLBody)
SW1.Close()
oApp1.Quit()
OMAItem = Nothing
oApp1 = Nothing
Dim wAppAtt As New Word.Application
Dim wdocAtt As New Word.Document
Dim MSWordExportFilePathAtt As String = name & ".pdf"
'//TextBox1.AppendText("Reading the HTML file" & vbNewLine)
wdocAtt = wAppAtt.Documents.Open(name & ".html")
'//TextBox1.AppendText("Saving the PDF file" & vbNewLine)
wdocAtt.ExportAsFixedFormat(MSWordExportFilePathAtt, _
MSWordExportFormat, MSWordOpenAfterExport, _
MSWordExportOptimizeFor, MSWordExportRange, MSWordStartPage, _
MSWordEndPage, MSWordExportItem, MSWordIncludeDocProps, _
MSWordKeepIRM, MSWordCreateBookmarks, _
MSWordDocStructureTags, MSWordBitmapMissingFonts, _
MSWordUseISO19005_1)
wdocAtt.Close()
wAppAtt.Quit()
wdocAtt = Nothing
wAppAtt = Nothing
End If
End If
Next
End If
TextBox1.AppendText("Writing as HTML file" & vbNewLine)
Dim SW As StreamWriter = New StreamWriter(newFolder & result & ".html")
SW.Write(OMItem.HTMLBody)
SW.Close()
oApp.Quit()
OMItem = Nothing
oApp = Nothing
Dim wApp As New Word.Application
Dim wdoc As New Word.Document
TextBox1.AppendText("Reading the HTML file" & vbNewLine)
wdoc = wApp.Documents.Open(newFolder & result & ".html")
TextBox1.AppendText("Saving the PDF file" & vbNewLine)
wdoc.ExportAsFixedFormat(MSWordExportFilePath, _
MSWordExportFormat, MSWordOpenAfterExport, _
MSWordExportOptimizeFor, MSWordExportRange, MSWordStartPage, _
MSWordEndPage, MSWordExportItem, MSWordIncludeDocProps, _
MSWordKeepIRM, MSWordCreateBookmarks, _
MSWordDocStructureTags, MSWordBitmapMissingFonts, _
MSWordUseISO19005_1)
wdoc.Close()
wApp.Quit()
wdoc = Nothing
wApp = Nothing
If System.IO.File.Exists(TextLine) = True Then
Dim mail As String = newFolder & getname
System.IO.File.Copy(TextLine, mail)
End If
Loop
End If
TextBox1.AppendText("All done" & vbNewLine)
MessageBox.Show("Converted")
Button1.Enabled = False
Me.Close()
End Sub
End Class
Thanks in advance.
Last edited by vijay2482; Oct 1st, 2015 at 10:33 AM.
Reason: (solved -Convertion of outlook email to pdf files)
When you open the .msg file, I bet you see a red "x" from the get go, meaning it may not be a "conversion" issue at all. The first thing I would look at is possibly an outlook "Trust Center" issue. If you open the message, outlook probably states:
"To help protect your privacy, Outlook prevented automatic download of some pictures in this message".
Open the Trust Center (File, Options, Trust Center, "Trust Center Settings") and look at "Automatic Download" Tab. You might want to tweak with those settings depending on what setup you are comfortable with.
There definitely is no straight-forward way to output to PDF, other than to use a "print to PDF" function, or do essentially what you are already doing.
Thanks a lot for your reply.
The automatic download is not been allowed to be active in my Outlook settings.
I will need to check with the admin.
Thanks for helping to figure out the issue.
Hi, I took the .msg files that contain the images and the images are well displayed when the .msg files are opened.
After converting to .html, i see the images are not converted to html, instead a black "x" is display in the .html.
The automatic download is allowed now.
Any suggestions pls.
Last edited by vijay2482; Sep 23rd, 2015 at 03:04 AM.
Hi, I took the .msg files that contain the images and the images are well displayed when the .msg files are opened.
After converting to .html, i see the images are not converted to html, instead a black "x" is display in the .html.
The automatic download is allowed now.
Any suggestions pls.
I really don't think a black X shows in the HTML: do you mean in the HTM renderer (e.g. web browser)? What does the HTML actually look like for the image?
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
Hi, thanks for the reply. I'm using internet explorer as my web browser.
Please find the attached, to see how the images are displayed in the .html page.
Again, what does the HTML look like for the image?
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
No, that's the browser rendering the HTML. You need to show the actual HTML code; specifically for the image.
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
The IMG tag points to a location of an image, or a resource, or is base64 encoded. It's looking for a resource (Image) which isn't where it thinks it is. Mail items have resources with ID numbers which a mail application reads and formats. When converted to HTML, it needs to extract the images. The HTML image tag needs to point to the correct location.
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
You will have to extract the image(s) and either:
* Put them in a storage location and modify the HTML to link to the image;
* Encode the image to base64 and replace the src with the base64 stream - some browsers may not render this correctly, however.
The src of the images is the resource ID in the message file.
HTML doesn't contain image data; it's a text-based file format.
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
Hi, I have coded to get the image files and save them in the location where the .msg files are stored.
Code:
If newFileName.Contains(".png") Or newFileName.Contains(".jpeg") Or newFileName.Contains(".jpg") Or newFileName.Contains(".gif") Or newFileName.Contains(".bmp") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
End If
You will have to modify the HTML to properly reflect the path to the objects. It's just a string, so you could do it that way, or I'm sure you can treat it as an XML document and modify images through the document structure. Have a look at that.
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
You will have to extract the image(s) and either:
* Put them in a storage location and modify the HTML to link to the image;
* Encode the image to base64 and replace the src with the base64 stream - some browsers may not render this correctly, however.
The src of the images is the resource ID in the message file.
HTML doesn't contain image data; it's a text-based file format.
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
Thanks.
I could get the base64 seam for an image, but im no sure, how i could modify the <img> tag with the base-' and save it as word or html document.
Thanks.
I could get the base64 seam for an image, but im no sure, how i could modify the <img> tag with the base-' and save it as word or html document.
It's just a text file or string. Do you know how to modify strings? There are lots and lots of mechanisms to do that. You would replace the image source with the base 64 string, or point to the file.
"Ok, my response to that is pending a Google search" - Bucky Katt. "There are two types of people in the world: Those who can extrapolate from incomplete data sets." - Unk. "Before you can 'think outside the box' you need to understand where the box is."
I could save the base64 into a text file or string.
My issue is that, the html file is written with the HTMLBody.
Code:
Dim SW As StreamWriter = New StreamWriter(newFolder & result & ".html", False, System.Text.Encoding.Default)
SW.Write(OMItem.HTMLBody)
SW.Close()
Do I need to open that html file and edit the source of the image?
If so, how to search for the "src" in the html file, as there a more than 1 image in my .msg file, how to identify which source image has to be linked with base64 string or file.
I have written the below code to get the src of image from html and to replace with a file.
Code:
Dim b As String = "C:\aaaa\image002.png"
Dim objReader2 As New System.IO.StreamReader(newFolder & result & ".html",System.Text.Encoding.Default)
Dim TextLine2 As String = ""
Do While objReader2.Peek() <> -1
TextLine2 = objReader2.ReadLine()
Try
Dim RegexObj As New Regex("<img[^>]+src=[""']([^""']+)[""']", RegexOptions.Singleline Or RegexOptions.IgnoreCase)
Dim MatchResults As Match = RegexObj.Match(TextLine2)
While MatchResults.Success
MatchResults = MatchResults.NextMatch()
MessageBox.Show(MatchResults.Groups(1).Value)
MatchResults.Groups(1).Value.Replace(MatchResults.Groups(1).Value, b)
End While
Catch ex As ArgumentException
End Try
Loop
objReader2.Close()
Not sure how to save the html with the modified src.