Results 1 to 8 of 8

Thread: Read Word Document file

  1. #1
    KSP
    Guest

    Question Read Word Document file

    Hi...
    I am Beginner for Visual Basic..I am using VB6.0..
    I want to read a word document & search the word document for specified words...
    i am getting problem when i read word document..
    *.txt is easy & fine, but if i read *.doc some problems are rising..
    it' not as such ....so if anyone know the details please let me know.
    Thank you

  2. #2
    -= B u g S l a y e r =- peet's Avatar
    Join Date
    Aug 2000
    Posts
    9,629
    not sure what u are after,
    I think its something along these lines

    VB Code:
    1. Private objWord As Word.Application
    2. Private wd As Word.Document
    3.  
    4. Private Sub Command1_Click()
    5.     Dim myRange As Range
    6.     Dim sSearchfor As String
    7.     sSearchfor = InputBox("What do u want to search for?")
    8.     If sSearchfor = "" Then Exit Sub
    9.    
    10.     If objWord Is Nothing Then
    11.         Set objWord = CreateObject("Word.Application")
    12.     Else
    13.         Set objWord = GetObject(, "Word.Application")
    14.     End If
    15.     DoEvents
    16.     Set wd = objWord.Documents.Open("c:\Test.doc")
    17.    
    18.     Set myRange = wd.Content
    19.     myRange.Find.Execute FindText:=sSearchfor, Forward:=True
    20.     If myRange.Find.Found Then
    21.         MsgBox "The document contains :'" & sSearchfor & "'", vbInformation
    22.     Else
    23.         MsgBox "The document do NOT contain'" & sSearchfor & "'", vbInformation
    24.     End If
    25.    
    26.     If Not (wd Is Nothing) Then Set wd = Nothing
    27.     If Not (objWord Is Nothing) Then objWord.Application.Quit
    28.     If Not (objWord Is Nothing) Then Set objWord = Nothing
    29.  
    30. End Sub
    -= a peet post =-

  3. #3
    Hyperactive Member MetallicaD's Avatar
    Join Date
    Feb 2001
    Location
    Tallahassee, FL
    Posts
    488
    I assume that you are trying to read the .doc file directly the same way you are trying to read the .txt.. this wont work because Micro$oft uses an proprietary encoding standard to encrypt (somewhat) the contents of your files. This means that you can only open and read your .doc file using micro$ofts preferred tool.. MS WORD..

    you need to actually automate word to open up the file.. then you can easily assess the Find functionality built into Word

    I am sure many postings have been made on this forum on how to automate word to do the above.. search around.

    -mcd
    [vbcode]
    '*****************************
    MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
    '*****************************
    [/vbcode]

  4. #4
    Hyperactive Member MetallicaD's Avatar
    Join Date
    Feb 2001
    Location
    Tallahassee, FL
    Posts
    488
    and there we have it.. peet got to it before i could even submit my posting!.. thats what you need
    [vbcode]
    '*****************************
    MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
    '*****************************
    [/vbcode]

  5. #5
    -= B u g S l a y e r =- peet's Avatar
    Join Date
    Aug 2000
    Posts
    9,629
    Now he got a sample AND an explanation

    I really should get better on commenting my code
    -= a peet post =-

  6. #6
    Hyperactive Member MetallicaD's Avatar
    Join Date
    Feb 2001
    Location
    Tallahassee, FL
    Posts
    488
    Well, from what I see of your code around this forum, its clean enough that someone should be able to read it without comments..

    i think comments are evil.. but you have to do it especially in production systems..

    -mdc
    [vbcode]
    '*****************************
    MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
    '*****************************
    [/vbcode]

  7. #7
    WorkHorse
    Guest
    Micro$oft uses an proprietary encoding standard to encrypt (somewhat) the contents of your files. This means that you can only open and read your .doc file using micro$ofts preferred tool.. MS WORD..

    you need to actually automate word to open up the file
    Not true.

    DOC files are not any more "encrypted" than RTF or HTML files. The file has to hold the formating codes for Word, and Word can do a lot of formating, so you get more "junk" than RTF or TXT or HTM (but it isn't encrypted to only work in Word like PDF files pparently are). The problem with trying to open Word files like text files is that a DOC file has a CHR(26) at the begining, just after the "ÐÏ à¡±". That will cause an input after end of file error. So you need to read binary. Then the trick is to sort out the actual document text from the formatting text. I'm working with Word 2000 and i seems to be consistent in the binary format. This should get your file text without using Word at all:

    VB Code:
    1. Private Sub Command1_Click()
    2.     GetWordDocText
    3. End Sub
    4.  
    5. Sub GetWordDocText()
    6.  
    7.     Dim strDocPath As String
    8.     Dim strDocText As String
    9.     Dim intDocTextStart As Integer
    10.     Dim intDocTextEnd As Integer
    11.  
    12.     strDocPath = "C:\My Documents\BlahBlah.doc"
    13.  
    14.     Open strDocPath For Binary As #1
    15.         strDocText = Space(LOF(1))
    16.         Get #1, , strDocText
    17.     Close #1
    18.    
    19.     ' Find key character to before actual text.
    20.     intDocTextStart = InStr(1, strDocText, "Ù")
    21.     ' Text starts after the first Ù & 3 null characters.
    22.     intDocTextStart = intDocTextStart + 4
    23.    
    24.     ' Text ends with a null.
    25.     intDocTextEnd = InStr(intDocTextStart + 1, strDocText, Chr$(0))
    26.  
    27.     strDocText = Mid$(strDocText, intDocTextStart, intDocTextEnd - intDocTextStart)
    28.        
    29.     MsgBox strDocText
    30.  
    31. End Sub
    ps: I believe the file format is RichEdit3.0.

  8. #8
    Hyperactive Member MetallicaD's Avatar
    Join Date
    Feb 2001
    Location
    Tallahassee, FL
    Posts
    488
    Excellent information you provided here WorkHorse. I didnt know that you could read a .doc without word by using a binary read. I was always under the impression that, while .docs were not actually encrypted, that they were encoded in a semi-plain text manner that you could sort of read, but not get complete accurate information unless you used word. (take for example all of this junk in the file: "ÿ ÿ ö Ö€€€ €€€ Ö€€€").

    Thanks for the information!

    -mcd
    [vbcode]
    '*****************************
    MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
    '*****************************
    [/vbcode]

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width