dcsimg
Results 1 to 9 of 9

Thread: Getting source HTML from an MS Word webpage as a VB String

  1. #1

    Thread Starter
    New Member
    Join Date
    Jul 2006
    Location
    Scotland
    Posts
    15

    Getting source HTML from an MS Word webpage as a VB String

    Hello

    Any help on this would be much appreciated...

    My VB6 application creates letters in Microsoft Word based on a template but with specific info for each recipient.
    I want to store each individually-generated letter in my SQL Server database, because each letter must be retrievable in read-only mode. It's not viable to have each document stored separately on a server.

    So, my thinking is to:
    - generate each letter as a word doc
    - behind the scenes, save the letter as a html file
    - get the html source code, then pass this as a nvarchar to my SQL Server database

    What i need to know is:
    how can I save the HTML source code from a MS Word HTML document into a VB string variable? Once in the string variable, i can pass this to my insert stored procedure.

    I can view the source code ok when I open the HTML doc in IE, it's just the automated grabbing of this data that is eluding me...

    I've tried saving the ActiveDocument.HTMLProject.HTMLProjectItems(1).Text to the clipboard then "pasting" this into a string, but my letter isn't getting recreated when I load this HTML back into a word doc.

    Any ideas?
    (vital stats: VB6, Office 2002, Windows XP Sp2.)

    thanks

  2. #2
    Lively Member
    Join Date
    Oct 2007
    Posts
    68

    Re: Getting source HTML from an MS Word webpage as a VB String

    I don't know if it's available in office 2002, but have you considered saving to XML format instead of html?

    ActiveDocument.SaveAs FileName:=strDocName, _
    FileFormat:=wdFormatXML

    Either way, if your letter is always going to be small, and you're always going to save it as XML or HTML, you can simply save the file as the types available to msword, then read the temp files contents back into a string, and pass that to your procedure.

    There is a lot of help in the office .CHM for msword. Do a search for vbawd*.chm and you should find it.


    However, if you're going to be saving larger files and/or different file types, then you may want to consider just saving them in the original format, and first zipping them, then streaming them as binary to the database.


    We do something similar but for any file type, not just word docs - pdfs, xls, .doc, msg, etc.

    Instead of saving in the original format, to conserve space on the sql server, we compress it first. Then instead of trying to pass text only as a parm into a stored procedure, we Stream it.

    1. Save file as xml or html file to client's temp directory.

    2. Compress or Zip the file by creating an empty zip file and then using shell32 to copy its contents into the empty zip. Set reference to "Microsoft Shell Controls and Automation"

    Code:
    Private Function Create_Empty_Zip(sFileName As String) As Boolean
    
        Dim EmptyZip()  As Byte
        Dim J           As Integer
    
        On Error GoTo EH
        Create_Empty_Zip = False
    
        '//create zip header
        ReDim EmptyZip(1 To 22)
    
        EmptyZip(1) = 80
        EmptyZip(2) = 75
        EmptyZip(3) = 5
        EmptyZip(4) = 6
        
        For J = 5 To UBound(EmptyZip)
            EmptyZip(J) = 0
        Next
    
        '//create empty zip file with header
        Open sFileName For Binary Access Write As #1
    
        For J = LBound(EmptyZip) To UBound(EmptyZip)
            Put #1, , EmptyZip(J)
        Next
        
        Close #1
    
        Create_Empty_Zip = True
    
    EH:
        
        If Err.Number <> 0 Then
            MsgBox Err.Description, vbExclamation, "Error"
        End If
        
    End Function
    Add your XML or HTML or any file to .ZIP

    Code:
    Private Sub AddFile2Zip(sFileSource as string, sFileDest as string)
    
        '//copies single file to zip file
        
        Dim ShellClass  As Shell32.Shell
        Dim Filedest    As Shell32.Folder
        
        On Error GoTo EH
        
            
        '//create empty zip file with header
        If Not Create_Empty_Zip(sFileDest) Then
            GoTo EH
        End If
            
        Set ShellClass = New Shell32.Shell
        Set Filedest = ShellClass.NameSpace(sFileDest)
        
        Call Filedest.CopyHere(sFileSource, 20)
        
    EH:
    
        If Err.Number <> 0 Then
            MsgBox Err.Description, vbExclamation, "Error"
        End If
    
        Set ShellClass = Nothing
        Set Filedest = Nothing
        
    End Sub
    You should now have your file saved into a zip file on the client's temp directory, ready for streaming to the server. This is a bit more involved, but more flexible than trying to pass finite text into a proc, plus it will support other files and binaries that don't readily convert to a text format such as xml or html.


    Code:
    Private Function Stream_File() As Boolean
        
        Dim mStream     As ADODB.Stream
        Dim objQy       As ADODB.Command
        Dim objParm     As ADODB.Parameter
        Dim rsWork      As ADODB.Recordset
        Dim wString     As String
        Dim iStream()   As Byte
        Dim GetCmd      As String
        
        On Error GoTo EH
            
        Stream_File = False
        
        '=====================================================================
        'add new object info relating to object, -- except the binary
        ' add the binary part using ado.update method
        '=====================================================================
               
        '//Initialize objects
        Set objQy = Nothing
        Set objParm = Nothing
        Set mStream = Nothing
        Set objQy = New ADODB.Command
        Set objParm = New ADODB.Parameter
        Set mStream = New ADODB.Stream
        
        '//Read the file into the Stream object so we know the file size to store
        mStream.Type = adTypeBinary
        mStream.Open
        mStream.LoadFromFile sTempFile
        
        '//copy the data stream into a byte array
        ReDim iStream(mStream.Size)
        iStream = mStream.Read
    
        '//Inserting the attachment is a two step process.
        '//First the row needs to be inserted -- with NULL placed in the binary field.
        '//Then the row is updated -- and the binary field is streamed in.
    
        With objQy
           Set .ActiveConnection = g_objADOConnection
               .CommandType = adCmdStoredProc
               .CommandText = "mystoredproc"
    
            Set objParm = .CreateParameter("parm1", adSmallInt, adParamOutput, 12)
            .Parameters.Append objParm
        
            Set objParm = .CreateParameter("parm2", adInteger, adParamInput, , parm2)
            .Parameters.Append objParm
            
            .Execute
            
        End With
        
        Set rsWork = Nothing
        Set rsWork = New ADODB.Recordset
        
        GetCmd = "Select field1, field2 from MYTABLE where FIELD1 = parm1 and Field2 = parm2"
    
        
        rsWork.CursorLocation = adUseClient
        rsWork.Open GetCmd, g_objADOConnection, adOpenStatic, adLockOptimistic ', adAsyncExecute
        
        Do While (rsWork.State And adStateExecuting) = adStateExecuting
            Sleep (100)
        Loop
    
        '//Stream the file into the Image column
        rsWork!OBJT_IMG = iStream
        rsWork.Update
        
        Stream_File = True
    
    EH:
        
        Set objQy = Nothing
        Set objParm = Nothing
        Set rsWork = Nothing
        Set mStream = Nothing
        Erase iStream
    
        
        If Err.Number <> 0 Then
            Err.Raise Err.Number, "Stream_File:" & Err.Source, Err.Description
        End If
    
    End Function

  3. #3

    Thread Starter
    New Member
    Join Date
    Jul 2006
    Location
    Scotland
    Posts
    15

    Re: Getting source HTML from an MS Word webpage as a VB String

    hi Sweeve
    thanks for the reply - the letters i'll be storing will at most be 2 or 3 pages. I was thinking HTML as this would cover all the rich text in the doc, plus i'm more attuned to HTML than XML, but i'll have a look at the word help and see what's there.
    the code samples you've provided will come in handy if i need to save anything larger or non-XML-able but i don't think i'll need them just now, so saying if the customer decides to include an image in their letters instead of just text i may be giving that a shot!
    hopefully i'll be able to use the word help to get the letter contents into a string. Will post back after investigating...
    thanks

  4. #4

    Thread Starter
    New Member
    Join Date
    Jul 2006
    Location
    Scotland
    Posts
    15

    Re: Getting source HTML from an MS Word webpage as a VB String

    I've looked into the filetypes available for saving in Word 2002, and unfortunately XML isn't one of them.

    Sweeve - you mentioned that (quoting) "you can simply save the file as the types available to msword, then read the temp files contents back into a string, and pass that to your procedure."

    I couldn't find anything in the Word .chm help file on how to read the contents of a temp file into a string, do you have any code as an example?

    Also, if I was to store the zipped word docs as the code above shows, how would I go about getting the files back from the database and displayed on screen?

    thanks
    Last edited by Smeee; Oct 30th, 2007 at 06:36 AM.

  5. #5
    Lively Member
    Join Date
    Oct 2007
    Posts
    68

    Re: Getting source HTML from an MS Word webpage as a VB String

    I've looked into the filetypes available for saving in Word 2002, and unfortunately XML isn't one of them.

    Sweeve - you mentioned that (quoting) "you can simply save the file as the types available to msword, then read the temp files contents back into a string, and pass that to your procedure."


    Ah well. Saving as xml is probably confined to office 2003 and higher.

    If you want to save the file as html, then I suggest saving the file to a temp directory with the html extension using the built-in word SaveAs function. Note however, msword wraps this html in a lot of formatting specific to msword. It's not generic html.

    Following is very small example of using the built-in SaveAs function to html.

    I early bound the msword typelib, but you may want to late bind it as object and do a createobject on it.

    (i.e., dim objword as object, then CreateObject("Word.Application"). This will make it compatible with later and earlier typelibs.

    Code:
        Dim J As Integer
        
        Const wdFormatHTML = 8
        Const wdFormatFilteredHTML = 10
        
        Dim objword As Word.Application    
        Set objword = New Word.Application
        
        objword.Documents.Open "c:\test.doc", , True
        objword.Documents(1).SaveAs "test.html", wdFormatFilteredHTML
    
        '//you can look at the objword.document(1).Content here to see what's in the doc
    
        For J = 1 To objword.Documents.Count
            objword.Documents(J).Close
        Next
    
        Set objword = Nothing
    This should save the document as an html document.

    Note, most stored procedures wont accept a parm as large as your html text - usually not greater than say 255 characters. This is why I recommend just zipping the original image and streaming to/from an image column on your database.

    I couldn't find anything in the Word .chm help file on how to read the contents of a temp file into a string, do you have any code as an example?

    There won't be anything in the Word.chm for that, that's just straight VB. Once you've saved your file as HTML, It's basically just reading in the html file.

    Code:
        Dim strInput    As String
        Dim strfile      As String
    
    Open myhtmlFileName For Input As #1
                
    Do While Not EOF(1)
        Line Input #1, strInput
        strfile = strfile & strInput & vbCrLf
    Loop
                
    Close #1
    You should be able to see your html source in strfile.

    Also, if I was to store the zipped word docs as the code above shows, how would I go about getting the files back from the database and displayed on screen?

    Just do the streaming in reverse.

    Code:
        Dim GetCmd          As String
        Dim rsWork          As ADODB.Recordset
        Dim mStream         As ADODB.Stream
        Dim iStream()       As Byte
    
        ReDim iStream(0)
        
        '//retrieve image from table
        GetCmd = "execute mystoredproc @parm1=" & parm1
        
        Set rsWork = New ADODB.Recordset
        rsWork.Open GetCmd, myadoconnection, adOpenForwardOnly, adLockReadOnly
    
    
    '//return image into byte array
        If Not rsWork.EOF And Not rsWork.BOF Then
            If Len(rsWork.Fields(0) & "") > 0 Then
                ReDim iStream(Len(Trim$(rsWork.Fields(0)))) As Byte
                iStream = rsWork.Fields(0)
            Else
                '//err handling
            End If
        Else
    
    '//stream to temp file
        If UBound(iStream) > 0 Then
    
            '//write binary stream from rs to file
            Set mStream = New ADODB.Stream
    
            With mStream
                .Type = adTypeBinary
                .Open
    
                .Write iStream
                .SaveToFile sTempFile, adSaveCreateNotExist
                
                .Close
            End With
        End If

    If you've zipped the image, just use shell32 to unzip your file. Shell32 treats the zip file as a directory, so use the CopyHere from source.zip to dest.

    Hope this helps

  6. #6
    PowerPoster
    Join Date
    Feb 2006
    Location
    East of NYC, USA
    Posts
    5,692

    Re: Getting source HTML from an MS Word webpage as a VB String

    Since the template is always the same, why store it for every letter? Store the DATA for each letter in a single record (each item in a field) and recreate the letter with code at run time, using the stored data. That way your database would be much more highly normalized.
    The most difficult part of developing a program is understanding the problem.
    The second most difficult part is deciding how you're going to solve the problem.
    Actually writing the program (translating your solution into some computer language) is the easiest part.

    Please indent your code and use [HIGHLIGHT="VB"] [/HIGHLIGHT] tags around it to make it easier to read.

    Please Help Us To Save Ana

  7. #7
    Lively Member
    Join Date
    Oct 2007
    Posts
    68

    Re: Getting source HTML from an MS Word webpage as a VB String

    I would agree with AL42. If all this is ever going to be is a mailmerge, then storing just the data is the way to go.

    Less resources on the server
    Less hassle programatically
    Less code to maintain
    Your dba's are happier (they generally hate image fields ).

    However if any pics, images or binary information are going to be saved in the documents, or if any other types of files are ever going to be saved in the future (spreadsheets, emails, pdfs, etc) or if you need the original documents for legal reasons, then saving the whole file is one way to go.

    Also I generally don't like lan based solutions, but the files could also be housed there instead of streaming them to a database - especially when things like 15-20mb pdfs are the norm with a large client base.

  8. #8

    Thread Starter
    New Member
    Join Date
    Jul 2006
    Location
    Scotland
    Posts
    15

    Re: Getting source HTML from an MS Word webpage as a VB String

    Unfortunately I can't guarantee that the template will remain the same - it's possible that the corporate logo/font or even office title will change as it has done very recently, so i'll need to be able to recreate a carbon copy of exactly what was sent out, rather than simply adding the correct data to the latest copy of the template. Wish it could be otherwise though!...

    There's a couple of other complications when it comes to saving files on our LAN, so that's why I was looking to get content and formatting into the db.

    there is a workaround i've found, which is to save the RichText from the word doc into a RichTextBox on the VB form, then using the GetData event to store the RichText encoding and content to the db. Was hoping for something a bit less clunky but it'll have to wait until we get Office 2003 here!

    Sweeve - Will give the code above a try as well though as it'll be very useful for other file types.

    thanks for the advice

  9. #9

    Thread Starter
    New Member
    Join Date
    Jul 2006
    Location
    Scotland
    Posts
    15

    Re: Getting source HTML from an MS Word webpage as a VB String

    Hi Sweeve

    I've finally got this up and running - a huge thankyou for all your code and advice.
    I've got as far as saving the binary to the db, and retrieving the document into a zip file.
    However it's unzipping the zip file that's escaping me now - can you help?

    you mentioned in a previous post:

    "If you've zipped the image, just use shell32 to unzip your file. Shell32 treats the zip file as a directory, so use the CopyHere from source.zip to dest"

    I've tried various flavours of parameter with CopyHere but to no avail - do i need to set a new reference to anything? any sample code for this?

    thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width