|
-
Jan 3rd, 2009, 05:41 AM
#1
Thread Starter
Hyperactive Member
[RESOLVED] Is this word in this Word 2007 document ?
I have the path to a Word 2007 document, and a text string.
How do I find out if that text string appears at least once as a whole word in that document.
-
Jan 3rd, 2009, 05:44 AM
#2
Re: Is this word in this Word 2007 document ?
Open the docx and do a .Find is one way but I know there is another way that doesnt require you to open the document. Have you tried a search yet?
VB/Office Guru™ (AKA: Gangsta Yoda™ ®)
I dont answer coding questions via PM. Please post a thread in the appropriate forum. 
Microsoft MVP 2006-2011
Office Development FAQ (C#, VB.NET, VB 6, VBA)
Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
If a post has helped you then Please Rate it! 
• Reps & Rating Posts • VS.NET on Vista • Multiple .NET Framework Versions • Office Primary Interop Assemblies • VB/Office Guru™ Word SpellChecker™.NET • VB/Office Guru™ Word SpellChecker™ VB6 • VB.NET Attributes Ex. • Outlook Global Address List • API Viewer utility • .NET API Viewer Utility •
System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6 
-
Jan 3rd, 2009, 05:59 AM
#3
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
...I liked your second option.
I tried to Open FileName.docx As Input As #1, but that was no good.
I thought I could just cycle through each record and do an Instr on each record.
Would opening as Binary work ?
-
Jan 3rd, 2009, 06:26 AM
#4
Re: Is this word in this Word 2007 document ?
You could probably save a copy of that word file as RTF then load it in a RichTextBox control and do the searching there?
-
Jan 3rd, 2009, 06:33 AM
#5
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
...no I need to loop through one or more folders looking for Word files and searching each. For each file I need to look if one or more words appear, observing AND, OR and parentheses. I've coded that bit. I just need to find if a word is in a document.
-
Jan 3rd, 2009, 11:25 AM
#6
Addicted Member
Re: Is this word in this Word 2007 document ?
You are thinking on the right track. Opening the file can be done using the binary method and searching can be done using Instr.
The real key to success in this type of program is how you code the compare routines. Things like case match, exact match, like operator matches, fuzzy searches, etc. are important. Also coding for various wildcard operators is important. To give you a small, tiny example... suppose you want to find a document with the word "man". Do you want a return for words like woman, Manfred Mann, Manitoba? What about a return for the word "men"? The list goes on and on.
There are samples of program code around on various VB web sites. Off the top of my head I'm pretty sure Planet Source Code has some sample programs.
-
Jan 3rd, 2009, 04:37 PM
#7
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
Thanks Tom. I had a look at Planet Source Code but couldn't find the sort of code I need - it might be there but its search facility is not good.
I don't need any sort of fuzzy matching. Just 'is this word in this document"
-
Jan 3rd, 2009, 11:17 PM
#8
Addicted Member
Re: Is this word in this Word 2007 document ?
Hi Robert:
I looked through my quickly looked through my snippets collection and found this. I assume you have the code to find the path/filename of the files you want to search. Once you have the list you could use this to search the files for your text matching. Important: It does require a reference to the Microsoft Scripting runtime.:
Declarations:
Code:
Private Const ForReading = 1 'FileSystemObject constants
Private Const ForWriting = 2
Private Const ForAppending = 8
Public FileList As New Collection 'List of files to search
Public Results As New Collection 'Results filenames
Public pos As New Collection 'Results position in file
Code:
Code:
Public Sub AddFile(path As String, filename As String)
'Add files to list. wildcards allowed
Dim s As String
s = Dir(path + filename)
Do While s <> ""
FileList.Add path + s, path + s
s = Dir
Loop
End Sub
Public Sub ClearFileList()
'Clear the files list
Dim i As Integer
For i = 1 To FileList.Count
FileList.Remove 1
Next i
End Sub
Public Function Find(st As String) As Integer
'Find st in the files listed. returns the number of results
Dim tx As String, i As Integer
Find = 0
Set fso = CreateObject("Scripting.FileSystemObject")
For i = 1 To Results.Count
Results.Remove 1
Next i
For Each fn In FileList
Set fil = fso.GetFile(fn)
Set ts = fil.OpenAsTextStream(ForReading)
tx = ts.ReadAll
i = InStr(1, tx, st)
Do While i > 0
Find = Find + 1
Results.Add fn
pos.Add i
i = InStr(i + 1, tx, st)
Loop
ts.Close
Next fn
End Function
Hopefully that will get you started.
-
Jan 4th, 2009, 01:27 AM
#9
Addicted Member
Re: Is this word in this Word 2007 document ?
Hey Robert:
Go to this thread. Download the FindFiles.zip in post #3. This search class project may make your life a whole lot easier.
http://www.vbforums.com/showthread.p...earch+Dir+text
Tom
-
Jan 4th, 2009, 04:35 AM
#10
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
Thanks Tom. That code had 2 problems.
1# my test string was in 2 files, but I only got 1 back.
2# I need to find whole words - this code finds the requested string inside any text.
The link seems to be more concerned with checking multiple files.
I have done all the other work. All I need is this...
I have the path of a .doc or .docx file, and a word. I just want a yes/no if that word appears at least once in the file.
It will operate under Word 2007.
The test should be case-insensitive - not a crucial issue
The text should for a whole word. - this is crucial
-
Jan 4th, 2009, 05:14 AM
#11
Re: Is this word in this Word 2007 document ?
record a macro of using search inside word, you can then adapt that code to the word object you are creating in a vb6 loop
2# I need to find whole words - this code finds the requested string inside any text.
to get whole words only put a space at the beginning and end of the search string, , though you will have a problem with punctuation marks if the search string is followed by one
you could just put a space before the search string, then check if the next character after the length of the search string is a space or punctation
Last edited by westconn1; Jan 4th, 2009 at 05:19 AM.
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Jan 4th, 2009, 05:20 AM
#12
Re: Is this word in this Word 2007 document ?
 Originally Posted by RobertLees
Thanks Tom. That code had 2 problems.
1# my test string was in 2 files, but I only got 1 back.
2# I need to find whole words - this code finds the requested string inside any text.
The link seems to be more concerned with checking multiple files.
I have done all the other work. All I need is this...
I have the path of a .doc or .docx file, and a word. I just want a yes/no if that word appears at least once in the file.
It will operate under Word 2007.
The test should be case-insensitive - not a crucial issue
The text should for a whole word. - this is crucial
To open each document with Word and search will be slow (as I mentioned in my first post) but it will be the most accurate. By using the Word Object Model and Late Binding you will have an app that is more stable and supportive of multiple versions of Word.
Edit: Record a macro
VB/Office Guru™ (AKA: Gangsta Yoda™ ®)
I dont answer coding questions via PM. Please post a thread in the appropriate forum. 
Microsoft MVP 2006-2011
Office Development FAQ (C#, VB.NET, VB 6, VBA)
Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
If a post has helped you then Please Rate it! 
• Reps & Rating Posts • VS.NET on Vista • Multiple .NET Framework Versions • Office Primary Interop Assemblies • VB/Office Guru™ Word SpellChecker™.NET • VB/Office Guru™ Word SpellChecker™ VB6 • VB.NET Attributes Ex. • Outlook Global Address List • API Viewer utility • .NET API Viewer Utility •
System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6 
-
Jan 4th, 2009, 09:03 AM
#13
Addicted Member
Re: Is this word in this Word 2007 document ?
The FindFiles .zip may be a bit over the top for what you want. Did you try the FileSystemObject code I posted? Find function has the code to search the file.
I need to find whole words - this code finds the requested string inside any text
This is exactly what I was referring to in my earlier post:
... suppose you want to find a document with the word "man". Do you want a return for words like woman, Manfred Mann, Manitoba?
To avoid returning words like woman you could search for the string " man" with a leading space... but then it would find words like Manitoba. To avoid that your search string could then be " man " with a leading and trailing space. Now the problem would be if the word man was followed by a punctuation mark such as " man, ". The solution there is to check the character after the "n" in man. You also could search for with an OR factor, that is, if String = " man " Or String = " man." Or String = " man, ", etc.
There are many ways to open the file and search as you have seen and now have code for... but unless you code the search for these considerations your search may be less than accurate. This is true for any search routine or program, including Word itself or Windows Search.
-
Jan 4th, 2009, 03:09 PM
#14
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
Initially I was testing with frm files in my development folder. Open strFileName For Input As #1 then read each record. Make the record lowercase, and replace each punctuation character with a space. Then do an instr to see if the whole word,case-insensitive exists. When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.
I thought of opening as Binary which might work.
Sure RobDog888 it would be slow opening each with Word, but I don't think there would be enough files for this to be a huge problem.
Had to laugh at your comment 'multiple versions of Word". I am trying to replace the use of FileSearch which Microsoft dropped in 2007.
I need to have a class that requires as little change to the app as possible. Therefore it is to obey the app's use of FileSearch properties. In doing this I can offer recognition of AND OR and parentheses.
I've done all this. It works fine with text files.
Word does this. I must be able to utilise this capability.
-
Jan 4th, 2009, 03:34 PM
#15
Re: Is this word in this Word 2007 document ?
When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.
opening for binary will probably avoid this
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Jan 4th, 2009, 05:02 PM
#16
Re: Is this word in this Word 2007 document ?
 Originally Posted by RobertLees
I am trying to replace the use of FileSearch which Microsoft dropped in 2007.
FileSearch was never mentioned in this thread and if you did support multiple versions of Word you could use it if the user was using 2003 or earlier.
VB/Office Guru™ (AKA: Gangsta Yoda™ ®)
I dont answer coding questions via PM. Please post a thread in the appropriate forum. 
Microsoft MVP 2006-2011
Office Development FAQ (C#, VB.NET, VB 6, VBA)
Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
If a post has helped you then Please Rate it! 
• Reps & Rating Posts • VS.NET on Vista • Multiple .NET Framework Versions • Office Primary Interop Assemblies • VB/Office Guru™ Word SpellChecker™.NET • VB/Office Guru™ Word SpellChecker™ VB6 • VB.NET Attributes Ex. • Outlook Global Address List • API Viewer utility • .NET API Viewer Utility •
System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6 
-
Jan 4th, 2009, 08:58 PM
#17
Re: Is this word in this Word 2007 document ?
FileSearch was never mentioned in this thread and if y
i noticed that, but there was someother thread by the op in office development, on this topic
i guess filesearch would be way to do it without opening each file, though i guess they would be able to read the word file format easily enough, wonder why it is no longer featured in word
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Jan 4th, 2009, 09:54 PM
#18
Re: Is this word in this Word 2007 document ?
Robert
Have you tried something like...
Code:
Open worddoc For Input As #1
Line Input #1
.. where worddoc would be the full pathname of your Word doc.
and the second statement would be used in a loop?
I haven't tried that specifically with a Word doc, but do it all the time
with .txt files. I did at least copy a Word doc and renamed it with a
.txt extension, then did a simple search. Non-printable characters are
skipped.. actual text is readable. I would imagine that Open..For Input
would work in a similar manner (but I may be wrong).
Spoo
-
Jan 5th, 2009, 03:01 AM
#19
Re: Is this word in this Word 2007 document ?
Have you tried something like...
see post #14
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Jan 5th, 2009, 01:13 PM
#20
Re: Is this word in this Word 2007 document ?
 Originally Posted by westconn1
see post #14
Haha.. thanks. I need better glasses
-
Jan 5th, 2009, 01:23 PM
#21
Re: Is this word in this Word 2007 document ?
 Originally Posted by RobertLees
Initially I was testing with frm files in my development folder. Open strFileName For Input As #1 then read each record. Make the record lowercase, and replace each punctuation character with a space. Then do an instr to see if the whole word,case-insensitive exists. When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.
Hey, did you try this?
1. copy the .doc file to a new folder
2. rename .doc file to a .txt file
3. do you still get a premature EOF ??
Spoo
-
Jan 5th, 2009, 03:38 PM
#22
Re: Is this word in this Word 2007 document ?
if the file contains and EOF character (ctrl Z) it will give read past end of file error no matter what file name /type, when opened for input, should work ok if opened for binary
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Jan 5th, 2009, 04:04 PM
#23
Re: Is this word in this Word 2007 document ?
Good point.. that's what gave the binary approach the special sauce.
-
Jan 5th, 2009, 04:30 PM
#24
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
...thanks everyone.
I am pretty sure opening for binary would work, but I abandoned that approach because I thought there would be a way to use Word's inbuilt search facility.
I inserted this ----- my PC uses (VISTA) OS ----- in a Word document, searched for vista, and I found it. I thought I was on the right track. BUT it also found ist.
Back to the binary approach. I'll have to do a bit of research into this, but if someone can get me started with binary, it would be appreciated
-
Jan 5th, 2009, 04:53 PM
#25
Re: Is this word in this Word 2007 document ?
Robert
Something like this might do the trick:
Code:
Dim aaBDVid()
PPath = "d:\bill's stuff\programs for bill\"
' 0. open file
vname = "movie for rob.txt"
vv = PPath + vname
' 1. use Get - create array
Close #1
Open vv For Binary As #1
ReDim aaBDVid(FileLen(vv))
Get #1, , aaBDVid
Close #1
'
Sorry for obtuse names (it was to read an AVI video file),
but the logic should be the same. Natch, change file names
and extensions to meet your needs. Basically, you
1. Dim an array
2. Open the file as binary
3. Dump the contents into the array (which will be a 1-D array)
4. Close the file, and then work from the array
The contents of the array will essentially be the ASCII code of
each character in the file (printable and non-printable).
Your task will then be to convert your search word into ASCII,
and then loop through the array looking at nn elements at a time,
where nn would be the length of the word you are searching for
(with appropriate lead space and following space|punctuation, etc.).
HTH
Spoo
Last edited by Spoo; Jan 5th, 2009 at 05:03 PM.
-
Jan 5th, 2009, 06:23 PM
#26
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
Thanks Spoo. That is the sort of info I wanted.
-
Jan 5th, 2009, 08:37 PM
#27
Thread Starter
Hyperactive Member
Re: Is this word in this Word 2007 document ?
When I got to Get #1, , aaBDVid, it gave error 458 - Variable uses an Automation type not supported in Visual Basic
-
Jan 5th, 2009, 09:24 PM
#28
Re: Is this word in this Word 2007 document ?
once opened for binary, you can also read the file using input or line input
depending on the size of your files i would just read entire file into a single string, then use instr to find if your search string was included in the file
vb Code:
open "Somefile.doc" for binary as 1 mystr = input(lof(1),#1) close 1 pos = instr(1, mystr, searchstr, vbtextcompare) if pos > 0 then msgbox "found in this file"
i know thiswill work in text files containing end of file characters, and i have tested with word doc, but i can not promise that it will always find the correct answers
this being the case, the same code will work as you use for text files, except open for binary
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Jan 5th, 2009, 10:51 PM
#29
Addicted Member
Re: Is this word in this Word 2007 document ?
Robert:
Here is a search function designed using the binary method. You may have to make some adaptations but this is the basic method for searching in binary mode:
Code:
Function InFileSearch(ByVal sFile As String, Optional ByVal str As String = "") As Boolean
On Error GoTo Errhandler
Dim f As Integer
Dim Buf As String
Dim BufLen As Long
Dim FoundPos As Long
'Make sure they entered a file and string to search
str = Trim$(str)
If str = "" Then Exit Function
If Trim$(sFile) = "" Then Exit Function
'case insensitive so make all lower case
str = LCase$(str)
'Open File for Binary Read
f = FreeFile
Open sFile For Binary Access Read As f
BufLen = LOF(f) 'FileLen(sFile)
If BufLen = 0 Then 'empty file so exit out
Close f
Exit Function
End If
'create buffer string to hold the file data
Buf = Space$(BufLen)
Get #f, , Buf
'look for case insensitive string
FoundPos = InStr(LCase$(Buf), LCase$(str))
If FoundPos Then
InFileSearch = True
Else
InFileSearch = False
End If
Close f
Buf = ""
Exit Function
Errhandler:
Close f
Buf = ""
MsgBox Error$, vbOKOnly, "Error"
End Function
The function returns true if found, false if not found.
-
Jan 6th, 2009, 07:58 AM
#30
Re: Is this word in this Word 2007 document ?
 Originally Posted by RobertLees
When I got to Get #1, , aaBDVid, it gave error 458 - Variable uses an Automation type not supported in Visual Basic
Sorry, seems that the Dim statement in my earlier post was incomplete.
I just checked that app and found that I did the following in the
Declarations section of the form:
Code:
Dim aaBDVid() As Byte
The Get statement is the one that is populating the array for you, thus
the array type is important.
I hope that does the trick
Spoo
Last edited by Spoo; Jan 6th, 2009 at 09:28 AM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|