-
Image files not converted to html
Hi all,
I'm trying to convert a Outlook .msg file to pdf. For this, I 1st convert the .msg file to .html file and then export the .html file as .pdf file. It works perfect except for the images in the .msg file. All the images are shown with "x" mark in html.
Here is my code: I get the folder with .msg files, write all the file names in a text file. Each line of the text file is read, checked for attachments with in the .msg file, save only the .msg atachments in .html and then to .pdf and also save the main .msg file as .html and .pdf.
code:
Code:
Option Strict On
Imports Microsoft.Office.Interop
Imports System.IO
Imports Microsoft.Office.Interop.Word
Imports System.Windows.Forms.Application
Imports System.Text
Imports System.Drawing.Imaging
Public Class Form1
Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
Dim path As String = ""
Dim folderBrowserDialog1 As FolderBrowserDialog
folderBrowserDialog1 = New System.Windows.Forms.FolderBrowserDialog()
Dim resultOK As DialogResult = folderBrowserDialog1.ShowDialog
' get the selected path value
If resultOK = System.Windows.Forms.DialogResult.OK Then
path = folderBrowserDialog1.SelectedPath
path = path & "\"
End If
Dim di As New IO.DirectoryInfo(path)
Dim diar1 As IO.FileInfo() = di.GetFiles("*.msg", IO.SearchOption.AllDirectories)
Dim dra As IO.FileInfo
'list the names of all files in the specified directory
For Each dra In diar1
ListBox1.Items.Add(dra.FullName)
Next
Dim FileNumber As Integer = FreeFile()
FileOpen(FileNumber, path & "ListofExcelFiles.txt", OpenMode.Output)
For Each Item As Object In ListBox1.Items
PrintLine(FileNumber, Item.ToString)
Next
FileClose(FileNumber)
Dim FILE_NAME As String = path & "ListofExcelFiles.txt"
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objReader As New System.IO.StreamReader(FILE_NAME, System.Text.Encoding.UTF7)
Dim TextLine As String = ""
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
'***************************************************
Dim position As Integer = TextLine.LastIndexOf("\"c)
Dim getname As String
getname = TextLine.Substring(position + 1)
'MessageBox.Show(getname)
Dim extension = System.IO.Path.GetExtension(getname)
Dim result = getname.Substring(0, getname.Length - extension.Length)
'MessageBox.Show(result)
Dim newFolder As String = path & result & "\"
If IO.Directory.Exists(newFolder) Then
IO.Directory.Delete(newFolder, True)
IO.Directory.CreateDirectory(newFolder)
ElseIf Not IO.Directory.Exists(newFolder) Then
IO.Directory.CreateDirectory(newFolder)
End If
'Dim MSWordPageSetup As WdOrientation = WdOrientation.wdOrientLandscape
Dim MasterFileName As String = path & result
Dim MSWordExportFilePath As String = newFolder & result & ".pdf"
Dim MSWordExportFormat As WdExportFormat = WdExportFormat.wdExportFormatPDF
Dim MSWordOpenAfterExport As Boolean = False
Dim MSWordExportOptimizeFor As WdExportOptimizeFor = WdExportOptimizeFor.wdExportOptimizeForPrint
Dim MSWordExportRange As WdExportRange = WdExportRange.wdExportAllDocument
Dim MSWordStartPage As Int32 = 0
Dim MSWordEndPage As Int32 = 0
Dim MSWordExportItem As WdExportItem = WdExportItem.wdExportDocumentContent
Dim MSWordIncludeDocProps As Boolean = True
Dim MSWordKeepIRM As Boolean = True
Dim MSWordCreateBookmarks As WdExportCreateBookmarks = WdExportCreateBookmarks.wdExportCreateWordBookmarks
Dim MSWordDocStructureTags As Boolean = True
Dim MSWordBitmapMissingFonts As Boolean = True
Dim MSWordUseISO19005_1 As Boolean = False
TextBox1.Clear()
Dim oApp As New Outlook.Application
Dim OMItem As Outlook.MailItem
TextBox1.AppendText("Reading the .msg file" & vbNewLine)
OMItem = CType(oApp.CreateItemFromTemplate(MasterFileName & ".msg"), Outlook.MailItem)
Dim mailAttachments As Outlook.Attachments = OMItem.Attachments
Dim attachmentInfo As StringBuilder = New StringBuilder()
If Not IsNothing(mailAttachments) Then
For i As Integer = 1 To mailAttachments.Count
Dim currentAttachment As Outlook.Attachment = mailAttachments.Item(i)
If Not IsNothing(currentAttachment) Then
attachmentInfo.AppendFormat("#{0}", i)
attachmentInfo.AppendLine()
attachmentInfo.AppendFormat("File Name: {0}", currentAttachment.FileName)
attachmentInfo.AppendLine()
attachmentInfo.AppendFormat("Diplay Name: {0}", currentAttachment.DisplayName)
attachmentInfo.AppendLine()
attachmentInfo.AppendFormat("Type: {0}", currentAttachment.Type)
attachmentInfo.AppendLine()
attachmentInfo.AppendLine()
Dim attachmentName As String = currentAttachment.FileName
Dim counter As Integer = 0
Dim newFileName As String = attachmentName
While File.Exists(newFolder & newFileName)
counter = counter + 1
Dim positionAtt As Integer = newFileName.LastIndexOf("\"c)
Dim getAttname As String
getAttname = newFileName.Substring(positionAtt + 1)
Dim extensionAtt = System.IO.Path.GetExtension(getAttname)
Dim newFileNwmeWithoutExt = getAttname.Substring(0, getAttname.Length - extensionAtt.Length)
newFileName = String.Format("{0}({1})", newFileNwmeWithoutExt, counter.ToString())
newFileName = newFileName & extensionAtt
End While
If newFileName.Contains(".msg") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
Dim oApp1 As New Outlook.Application
Dim OMAItem As Outlook.MailItem
'//TextBox1.AppendText("Reading the .msg file" & vbNewLine)
Dim positionAtt As Integer = newFileName.LastIndexOf("\"c)
Dim getAttname As String
getAttname = newFileName.Substring(positionAtt + 1)
Dim extensionAtt = System.IO.Path.GetExtension(getAttname)
Dim currentAttachmentName = getAttname.Substring(0, getAttname.Length - extensionAtt.Length)
OMAItem = CType(oApp.CreateItemFromTemplate(newFolder & newFileName), Outlook.MailItem)
Dim name As String = newFolder & currentAttachmentName
Dim SW1 As StreamWriter = New StreamWriter(name & ".html")
SW1.Write(OMAItem.HTMLBody)
SW1.Close()
oApp1.Quit()
OMAItem = Nothing
oApp1 = Nothing
Dim wAppAtt As New Word.Application
Dim wdocAtt As New Word.Document
Dim MSWordExportFilePathAtt As String = name & ".pdf"
'//TextBox1.AppendText("Reading the HTML file" & vbNewLine)
wdocAtt = wAppAtt.Documents.Open(name & ".html")
'//TextBox1.AppendText("Saving the PDF file" & vbNewLine)
wdocAtt.ExportAsFixedFormat(MSWordExportFilePathAtt, _
MSWordExportFormat, MSWordOpenAfterExport, _
MSWordExportOptimizeFor, MSWordExportRange, MSWordStartPage, _
MSWordEndPage, MSWordExportItem, MSWordIncludeDocProps, _
MSWordKeepIRM, MSWordCreateBookmarks, _
MSWordDocStructureTags, MSWordBitmapMissingFonts, _
MSWordUseISO19005_1)
wdocAtt.Close()
wAppAtt.Quit()
wdocAtt = Nothing
wAppAtt = Nothing
End If
End If
Next
End If
TextBox1.AppendText("Writing as HTML file" & vbNewLine)
Dim SW As StreamWriter = New StreamWriter(newFolder & result & ".html")
SW.Write(OMItem.HTMLBody)
SW.Close()
oApp.Quit()
OMItem = Nothing
oApp = Nothing
Dim wApp As New Word.Application
Dim wdoc As New Word.Document
TextBox1.AppendText("Reading the HTML file" & vbNewLine)
wdoc = wApp.Documents.Open(newFolder & result & ".html")
TextBox1.AppendText("Saving the PDF file" & vbNewLine)
wdoc.ExportAsFixedFormat(MSWordExportFilePath, _
MSWordExportFormat, MSWordOpenAfterExport, _
MSWordExportOptimizeFor, MSWordExportRange, MSWordStartPage, _
MSWordEndPage, MSWordExportItem, MSWordIncludeDocProps, _
MSWordKeepIRM, MSWordCreateBookmarks, _
MSWordDocStructureTags, MSWordBitmapMissingFonts, _
MSWordUseISO19005_1)
wdoc.Close()
wApp.Quit()
wdoc = Nothing
wApp = Nothing
If System.IO.File.Exists(TextLine) = True Then
Dim mail As String = newFolder & getname
System.IO.File.Copy(TextLine, mail)
End If
Loop
End If
TextBox1.AppendText("All done" & vbNewLine)
MessageBox.Show("Converted")
Button1.Enabled = False
Me.Close()
End Sub
End Class
Thanks in advance.
-
Re: Image files not converted to html
When you open the .msg file, I bet you see a red "x" from the get go, meaning it may not be a "conversion" issue at all. The first thing I would look at is possibly an outlook "Trust Center" issue. If you open the message, outlook probably states:
"To help protect your privacy, Outlook prevented automatic download of some pictures in this message".
Open the Trust Center (File, Options, Trust Center, "Trust Center Settings") and look at "Automatic Download" Tab. You might want to tweak with those settings depending on what setup you are comfortable with.
There definitely is no straight-forward way to output to PDF, other than to use a "print to PDF" function, or do essentially what you are already doing.
-
Re: Image files not converted to html
Thanks a lot for your reply.
The automatic download is not been allowed to be active in my Outlook settings.
I will need to check with the admin.
Thanks for helping to figure out the issue.
-
Re: Image files not converted to html
Hi, I took the .msg files that contain the images and the images are well displayed when the .msg files are opened.
After converting to .html, i see the images are not converted to html, instead a black "x" is display in the .html.
The automatic download is allowed now.
Any suggestions pls.
-
Re: Image files not converted to html
Quote:
Originally Posted by
vijay2482
Hi, I took the .msg files that contain the images and the images are well displayed when the .msg files are opened.
After converting to .html, i see the images are not converted to html, instead a black "x" is display in the .html.
The automatic download is allowed now.
Any suggestions pls.
I really don't think a black X shows in the HTML: do you mean in the HTM renderer (e.g. web browser)? What does the HTML actually look like for the image?
-
1 Attachment(s)
Re: Image files not converted to html
Hi, thanks for the reply. I'm using internet explorer as my web browser.
Please find the attached, to see how the images are displayed in the .html page.
Thanks in advance.
-
Re: Image files not converted to html
Again, what does the HTML look like for the image?
-
Re: Image files not converted to html
The attached is the screen short of the html.
and one more issue, the french characters in the .msg files are not converted propertly to .html.
-
Re: Image files not converted to html
No, that's the browser rendering the HTML. You need to show the actual HTML code; specifically for the image.
-
Re: Image files not converted to html
I'm not writing any special code for images.
Below is the code, that converts the .msg file to .html and again to .pdf:
Code:
OMAItem = CType(oApp.CreateItemFromTemplate(newFolder & newFileName), Outlook.MailItem)
Dim name As String = newFolder & currentAttachmentName
Dim SW1 As StreamWriter = New StreamWriter(name & ".html")
SW1.Write(OMAItem.HTMLBody)
SW1.Close()
oApp1.Quit()
OMAItem = Nothing
oApp1 = Nothing
Dim wAppAtt As New Word.Application
Dim wdocAtt As New Word.Document
Dim MSWordExportFilePathAtt As String = name & ".pdf"
wdocAtt = wAppAtt.Documents.Open(name & ".html")
wdocAtt.ExportAsFixedFormat(MSWordExportFilePathAtt, _
MSWordExportFormat, MSWordOpenAfterExport, _
MSWordExportOptimizeFor, MSWordExportRange, MSWordStartPage, _
MSWordEndPage, MSWordExportItem, MSWordIncludeDocProps, _
MSWordKeepIRM, MSWordCreateBookmarks, _
MSWordDocStructureTags, MSWordBitmapMissingFonts, _
MSWordUseISO19005_1)
wdocAtt.Close()
wAppAtt.Quit()
wdocAtt = Nothing
wAppAtt = Nothing
-
Re: Image files not converted to html
What does the HTML <img /> tag look like?
The IMG tag points to a location of an image, or a resource, or is base64 encoded. It's looking for a resource (Image) which isn't where it thinks it is. Mail items have resources with ID numbers which a mail application reads and formats. When converted to HTML, it needs to extract the images. The HTML image tag needs to point to the correct location.
-
Re: Image files not converted to html
Hi,
I looked into the html of the page created and the image tag looks like this:
Code:
<img border=0 width=619 height=81 id="Image_x0020_2" src="cid:[email protected]"
alt="http://images.agcocorp.com/emailsignature/product_brands_v2.png">
Code:
<img width=146 height=158 id="Image_x0020_1" src="cid:[email protected]" alt="AGCO_logo">
I dont have an idea of it. Pls left me know how to go forward.
Thanks
-
Re: Image files not converted to html
You will have to extract the image(s) and either:
* Put them in a storage location and modify the HTML to link to the image;
* Encode the image to base64 and replace the src with the base64 stream - some browsers may not render this correctly, however.
The src of the images is the resource ID in the message file.
HTML doesn't contain image data; it's a text-based file format.
-
Re: Image files not converted to html
I can get the name of the image file in my code, but how to encode the image to base64 and replace src with base64 Stream?
thanks in advance.
-
Re: Image files not converted to html
Hi, I have coded to get the image files and save them in the location where the .msg files are stored.
Code:
If newFileName.Contains(".png") Or newFileName.Contains(".jpeg") Or newFileName.Contains(".jpg") Or newFileName.Contains(".gif") Or newFileName.Contains(".bmp") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
End If
How do I link these images with the html?
please provid suggestions.
Thanks in advance.
-
Re: Image files not converted to html
I'l writing my html file as below:
Code:
Dim SW1 As StreamWriter = New StreamWriter(name & ".html", False, System.Text.Encoding.Default)
SW1.Write(OMAItem.HTMLBody)
SW1.Close()
in this case, the html is already written, then how to get the "src" from this file and do the encoding for the image file found in the html?
Any help will be helpful.
Thanks in advance.
-
Re: Image files not converted to html
You will have to modify the HTML to properly reflect the path to the objects. It's just a string, so you could do it that way, or I'm sure you can treat it as an XML document and modify images through the document structure. Have a look at that.
-
Re: Image files not converted to html
Quote:
Originally Posted by
SJWhiteley
You will have to extract the image(s) and either:
* Put them in a storage location and modify the HTML to link to the image;
* Encode the image to base64 and replace the src with the base64 stream - some browsers may not render this correctly, however.
The src of the images is the resource ID in the message file.
HTML doesn't contain image data; it's a text-based file format.
That was my next guess. Good call.
-
Re: Image files not converted to html
I used this code to convet to base64 Stream
Code:
Dim encodingTypeString As String = String.Empty
If newFileName.Contains(".png") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
encodeType = ImageFormat.Png
encodingTypeString = "data:image/png;base64,"
decodingString = encodingTypeString
replaceString = encodingTypeString & ImageToBase64(System.Drawing.Image.FromFile(newFolder & newFileName), encodeType)
replaceString = replaceString.Replace("data:image/png;base64,", "")
' MessageBox.Show(replaceString)
ElseIf newFileName.Contains(".jpeg") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
encodeType = ImageFormat.Jpeg
encodingTypeString = "data:image/jpeg;base64,"
decodingString = encodingTypeString
replaceString = encodingTypeString & ImageToBase64(System.Drawing.Image.FromFile(newFolder & newFileName), encodeType)
replaceString = replaceString.Replace("data:image/jpeg;base64,", "")
' MessageBox.Show(replaceString)
ElseIf newFileName.Contains(".gif") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
encodeType = ImageFormat.Gif
encodingTypeString = "data:image/gif;base64,"
decodingString = encodingTypeString
replaceString = encodingTypeString & ImageToBase64(System.Drawing.Image.FromFile(newFolder & newFileName), encodeType)
replaceString = replaceString.Replace("data:image/gif;base64,", "")
' MessageBox.Show(replaceString)
ElseIf newFileName.Contains(".bmp") Then
currentAttachment.SaveAsFile(newFolder & newFileName)
encodeType = ImageFormat.Bmp
encodingTypeString = "data:image/bmp;base64,"
decodingString = encodingTypeString
replaceString = encodingTypeString & ImageToBase64(System.Drawing.Image.FromFile(newFolder & newFileName), encodeType)
replaceString = replaceString.Replace("data:image/bmp;base64,", "")
'MessageBox.Show(replaceString)
End If
Public Function ImageToBase64(ByVal image As Image, ByVal format As ImageFormat) As String
Using ms As New MemoryStream()
image.Save(ms, format)
Dim imageBytes As Byte() = ms.ToArray()
Dim base64String As String = Convert.ToBase64String(imageBytes)
Return base64String
End Using
End Function
but not sure how to use this value to the src.
any code sample will be very much helpful.
-
Re: Image files not converted to html
Google 'base64 image tag'.
-
Re: Image files not converted to html
Thanks.
I could get the base64 seam for an image, but im no sure, how i could modify the <img> tag with the base-' and save it as word or html document.
-
Re: Image files not converted to html
Quote:
Originally Posted by
vijay2482
Thanks.
I could get the base64 seam for an image, but im no sure, how i could modify the <img> tag with the base-' and save it as word or html document.
It's just a text file or string. Do you know how to modify strings? There are lots and lots of mechanisms to do that. You would replace the image source with the base 64 string, or point to the file.
-
Re: Image files not converted to html
hello,
I could save the base64 into a text file or string.
My issue is that, the html file is written with the HTMLBody.
Code:
Dim SW As StreamWriter = New StreamWriter(newFolder & result & ".html", False, System.Text.Encoding.Default)
SW.Write(OMItem.HTMLBody)
SW.Close()
Do I need to open that html file and edit the source of the image?
If so, how to search for the "src" in the html file, as there a more than 1 image in my .msg file, how to identify which source image has to be linked with base64 string or file.
-
Re: Image files not converted to html
I have written the below code to get the src of image from html and to replace with a file.
Code:
Dim b As String = "C:\aaaa\image002.png"
Dim objReader2 As New System.IO.StreamReader(newFolder & result & ".html",System.Text.Encoding.Default)
Dim TextLine2 As String = ""
Do While objReader2.Peek() <> -1
TextLine2 = objReader2.ReadLine()
Try
Dim RegexObj As New Regex("<img[^>]+src=[""']([^""']+)[""']", RegexOptions.Singleline Or RegexOptions.IgnoreCase)
Dim MatchResults As Match = RegexObj.Match(TextLine2)
While MatchResults.Success
MatchResults = MatchResults.NextMatch()
MessageBox.Show(MatchResults.Groups(1).Value)
MatchResults.Groups(1).Value.Replace(MatchResults.Groups(1).Value, b)
End While
Catch ex As ArgumentException
End Try
Loop
objReader2.Close()
Not sure how to save the html with the modified src.
-
1 Attachment(s)
(solved -Convertion of outlook email to pdf files) Image files not converted to html
Hi all,
With the suggestions from this forum, i have solved the issue.
The application converts .msg files to .pdf files.
Code:find the attached.
Posting the entire code, so that it will be helpful for those who want to do the conversion.
Regards.