|
-
Apr 17th, 2002, 01:38 PM
#1
Read Word Document file
Hi...
I am Beginner for Visual Basic..I am using VB6.0..
I want to read a word document & search the word document for specified words...
i am getting problem when i read word document..
*.txt is easy & fine, but if i read *.doc some problems are rising..
it' not as such ....so if anyone know the details please let me know.
Thank you
-
Apr 17th, 2002, 01:52 PM
#2
-= B u g S l a y e r =-
not sure what u are after,
I think its something along these lines
VB Code:
Private objWord As Word.Application
Private wd As Word.Document
Private Sub Command1_Click()
Dim myRange As Range
Dim sSearchfor As String
sSearchfor = InputBox("What do u want to search for?")
If sSearchfor = "" Then Exit Sub
If objWord Is Nothing Then
Set objWord = CreateObject("Word.Application")
Else
Set objWord = GetObject(, "Word.Application")
End If
DoEvents
Set wd = objWord.Documents.Open("c:\Test.doc")
Set myRange = wd.Content
myRange.Find.Execute FindText:=sSearchfor, Forward:=True
If myRange.Find.Found Then
MsgBox "The document contains :'" & sSearchfor & "'", vbInformation
Else
MsgBox "The document do NOT contain'" & sSearchfor & "'", vbInformation
End If
If Not (wd Is Nothing) Then Set wd = Nothing
If Not (objWord Is Nothing) Then objWord.Application.Quit
If Not (objWord Is Nothing) Then Set objWord = Nothing
End Sub
-
Apr 17th, 2002, 01:52 PM
#3
Hyperactive Member
I assume that you are trying to read the .doc file directly the same way you are trying to read the .txt.. this wont work because Micro$oft uses an proprietary encoding standard to encrypt (somewhat) the contents of your files. This means that you can only open and read your .doc file using micro$ofts preferred tool.. MS WORD..
you need to actually automate word to open up the file.. then you can easily assess the Find functionality built into Word
I am sure many postings have been made on this forum on how to automate word to do the above.. search around.
-mcd
[vbcode]
'*****************************
MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
'*****************************
[/vbcode]
-
Apr 17th, 2002, 01:53 PM
#4
Hyperactive Member
and there we have it.. peet got to it before i could even submit my posting!.. thats what you need
[vbcode]
'*****************************
MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
'*****************************
[/vbcode]
-
Apr 17th, 2002, 01:56 PM
#5
-
Apr 17th, 2002, 02:00 PM
#6
Hyperactive Member
Well, from what I see of your code around this forum, its clean enough that someone should be able to read it without comments..
i think comments are evil.. but you have to do it especially in production systems..
-mdc
[vbcode]
'*****************************
MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
'*****************************
[/vbcode]
-
Apr 18th, 2002, 11:10 PM
#7
Micro$oft uses an proprietary encoding standard to encrypt (somewhat) the contents of your files. This means that you can only open and read your .doc file using micro$ofts preferred tool.. MS WORD..
you need to actually automate word to open up the file
Not true.
DOC files are not any more "encrypted" than RTF or HTML files. The file has to hold the formating codes for Word, and Word can do a lot of formating, so you get more "junk" than RTF or TXT or HTM (but it isn't encrypted to only work in Word like PDF files pparently are). The problem with trying to open Word files like text files is that a DOC file has a CHR(26) at the begining, just after the "ÐÏ à¡±". That will cause an input after end of file error. So you need to read binary. Then the trick is to sort out the actual document text from the formatting text. I'm working with Word 2000 and i seems to be consistent in the binary format. This should get your file text without using Word at all:
VB Code:
Private Sub Command1_Click()
GetWordDocText
End Sub
Sub GetWordDocText()
Dim strDocPath As String
Dim strDocText As String
Dim intDocTextStart As Integer
Dim intDocTextEnd As Integer
strDocPath = "C:\My Documents\BlahBlah.doc"
Open strDocPath For Binary As #1
strDocText = Space(LOF(1))
Get #1, , strDocText
Close #1
' Find key character to before actual text.
intDocTextStart = InStr(1, strDocText, "Ù")
' Text starts after the first Ù & 3 null characters.
intDocTextStart = intDocTextStart + 4
' Text ends with a null.
intDocTextEnd = InStr(intDocTextStart + 1, strDocText, Chr$(0))
strDocText = Mid$(strDocText, intDocTextStart, intDocTextEnd - intDocTextStart)
MsgBox strDocText
End Sub
ps: I believe the file format is RichEdit3.0.
-
Apr 19th, 2002, 01:56 PM
#8
Hyperactive Member
Excellent information you provided here WorkHorse. I didnt know that you could read a .doc without word by using a binary read. I was always under the impression that, while .docs were not actually encrypted, that they were encoded in a semi-plain text manner that you could sort of read, but not get complete accurate information unless you used word. (take for example all of this junk in the file: "ÿ ÿ ö Ö€€€ €€€ Ö€€€").
Thanks for the information!
-mcd
[vbcode]
'*****************************
MsgBox "MCD :: [email protected]", vbInformation + vbOKOnly, "User"
'*****************************
[/vbcode]
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|