I have the path to a Word 2007 document, and a text string.
How do I find out if that text string appears at least once as a whole word in that document.
Printable View
I have the path to a Word 2007 document, and a text string.
How do I find out if that text string appears at least once as a whole word in that document.
Open the docx and do a .Find is one way but I know there is another way that doesnt require you to open the document. Have you tried a search yet?
...I liked your second option.
I tried to Open FileName.docx As Input As #1, but that was no good.
I thought I could just cycle through each record and do an Instr on each record.
Would opening as Binary work ?
You could probably save a copy of that word file as RTF then load it in a RichTextBox control and do the searching there?
...no I need to loop through one or more folders looking for Word files and searching each. For each file I need to look if one or more words appear, observing AND, OR and parentheses. I've coded that bit. I just need to find if a word is in a document.
You are thinking on the right track. Opening the file can be done using the binary method and searching can be done using Instr.
The real key to success in this type of program is how you code the compare routines. Things like case match, exact match, like operator matches, fuzzy searches, etc. are important. Also coding for various wildcard operators is important. To give you a small, tiny example... suppose you want to find a document with the word "man". Do you want a return for words like woman, Manfred Mann, Manitoba? What about a return for the word "men"? The list goes on and on.
There are samples of program code around on various VB web sites. Off the top of my head I'm pretty sure Planet Source Code has some sample programs.
Thanks Tom. I had a look at Planet Source Code but couldn't find the sort of code I need - it might be there but its search facility is not good.
I don't need any sort of fuzzy matching. Just 'is this word in this document"
Hi Robert:
I looked through my quickly looked through my snippets collection and found this. I assume you have the code to find the path/filename of the files you want to search. Once you have the list you could use this to search the files for your text matching. Important: It does require a reference to the Microsoft Scripting runtime.:
Declarations:
Code:Code:Private Const ForReading = 1 'FileSystemObject constants
Private Const ForWriting = 2
Private Const ForAppending = 8
Public FileList As New Collection 'List of files to search
Public Results As New Collection 'Results filenames
Public pos As New Collection 'Results position in file
Hopefully that will get you started.Code:Public Sub AddFile(path As String, filename As String)
'Add files to list. wildcards allowed
Dim s As String
s = Dir(path + filename)
Do While s <> ""
FileList.Add path + s, path + s
s = Dir
Loop
End Sub
Public Sub ClearFileList()
'Clear the files list
Dim i As Integer
For i = 1 To FileList.Count
FileList.Remove 1
Next i
End Sub
Public Function Find(st As String) As Integer
'Find st in the files listed. returns the number of results
Dim tx As String, i As Integer
Find = 0
Set fso = CreateObject("Scripting.FileSystemObject")
For i = 1 To Results.Count
Results.Remove 1
Next i
For Each fn In FileList
Set fil = fso.GetFile(fn)
Set ts = fil.OpenAsTextStream(ForReading)
tx = ts.ReadAll
i = InStr(1, tx, st)
Do While i > 0
Find = Find + 1
Results.Add fn
pos.Add i
i = InStr(i + 1, tx, st)
Loop
ts.Close
Next fn
End Function
Hey Robert:
Go to this thread. Download the FindFiles.zip in post #3. This search class project may make your life a whole lot easier.
http://www.vbforums.com/showthread.p...earch+Dir+text
Tom
Thanks Tom. That code had 2 problems.
1# my test string was in 2 files, but I only got 1 back.
2# I need to find whole words - this code finds the requested string inside any text.
The link seems to be more concerned with checking multiple files.
I have done all the other work. All I need is this...
I have the path of a .doc or .docx file, and a word. I just want a yes/no if that word appears at least once in the file.
It will operate under Word 2007.
The test should be case-insensitive - not a crucial issue
The text should for a whole word. - this is crucial
record a macro of using search inside word, you can then adapt that code to the word object you are creating in a vb6 loop
to get whole words only put a space at the beginning and end of the search string, , though you will have a problem with punctuation marks if the search string is followed by oneQuote:
2# I need to find whole words - this code finds the requested string inside any text.
you could just put a space before the search string, then check if the next character after the length of the search string is a space or punctation
To open each document with Word and search will be slow (as I mentioned in my first post) but it will be the most accurate. By using the Word Object Model and Late Binding you will have an app that is more stable and supportive of multiple versions of Word.Quote:
Originally Posted by RobertLees
Edit: Record a macro
The FindFiles .zip may be a bit over the top for what you want. Did you try the FileSystemObject code I posted? Find function has the code to search the file.
This is exactly what I was referring to in my earlier post:Quote:
I need to find whole words - this code finds the requested string inside any text
To avoid returning words like woman you could search for the string " man" with a leading space... but then it would find words like Manitoba. To avoid that your search string could then be " man " with a leading and trailing space. Now the problem would be if the word man was followed by a punctuation mark such as " man, ". The solution there is to check the character after the "n" in man. You also could search for with an OR factor, that is, if String = " man " Or String = " man." Or String = " man, ", etc.Quote:
... suppose you want to find a document with the word "man". Do you want a return for words like woman, Manfred Mann, Manitoba?
There are many ways to open the file and search as you have seen and now have code for... but unless you code the search for these considerations your search may be less than accurate. This is true for any search routine or program, including Word itself or Windows Search.
Initially I was testing with frm files in my development folder. Open strFileName For Input As #1 then read each record. Make the record lowercase, and replace each punctuation character with a space. Then do an instr to see if the whole word,case-insensitive exists. When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.
I thought of opening as Binary which might work.
Sure RobDog888 it would be slow opening each with Word, but I don't think there would be enough files for this to be a huge problem.
Had to laugh at your comment 'multiple versions of Word". I am trying to replace the use of FileSearch which Microsoft dropped in 2007.
I need to have a class that requires as little change to the app as possible. Therefore it is to obey the app's use of FileSearch properties. In doing this I can offer recognition of AND OR and parentheses.
I've done all this. It works fine with text files.
Word does this. I must be able to utilise this capability.
opening for binary will probably avoid thisQuote:
When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.
FileSearch was never mentioned in this thread and if you did support multiple versions of Word you could use it if the user was using 2003 or earlier.Quote:
Originally Posted by RobertLees
i noticed that, but there was someother thread by the op in office development, on this topicQuote:
FileSearch was never mentioned in this thread and if y
i guess filesearch would be way to do it without opening each file, though i guess they would be able to read the word file format easily enough, wonder why it is no longer featured in word
Robert
Have you tried something like...
.. where worddoc would be the full pathname of your Word doc.Code:Open worddoc For Input As #1
Line Input #1
and the second statement would be used in a loop?
I haven't tried that specifically with a Word doc, but do it all the time
with .txt files. I did at least copy a Word doc and renamed it with a
.txt extension, then did a simple search. Non-printable characters are
skipped.. actual text is readable. I would imagine that Open..For Input
would work in a similar manner (but I may be wrong).
Spoo
see post #14Quote:
Have you tried something like...
Haha.. thanks. I need better glassesQuote:
Originally Posted by westconn1
Hey, did you try this?Quote:
Originally Posted by RobertLees
1. copy the .doc file to a new folder
2. rename .doc file to a .txt file
3. do you still get a premature EOF ??
Spoo
if the file contains and EOF character (ctrl Z) it will give read past end of file error no matter what file name /type, when opened for input, should work ok if opened for binary
Good point.. that's what gave the binary approach the special sauce.
...thanks everyone.
I am pretty sure opening for binary would work, but I abandoned that approach because I thought there would be a way to use Word's inbuilt search facility.
I inserted this ----- my PC uses (VISTA) OS ----- in a Word document, searched for vista, and I found it. I thought I was on the right track. BUT it also found ist.
Back to the binary approach. I'll have to do a bit of research into this, but if someone can get me started with binary, it would be appreciated
Robert
Something like this might do the trick:
Sorry for obtuse names (it was to read an AVI video file),Code:Dim aaBDVid()
PPath = "d:\bill's stuff\programs for bill\"
' 0. open file
vname = "movie for rob.txt"
vv = PPath + vname
' 1. use Get - create array
Close #1
Open vv For Binary As #1
ReDim aaBDVid(FileLen(vv))
Get #1, , aaBDVid
Close #1
'
but the logic should be the same. Natch, change file names
and extensions to meet your needs. Basically, you
1. Dim an array
2. Open the file as binary
3. Dump the contents into the array (which will be a 1-D array)
4. Close the file, and then work from the array
The contents of the array will essentially be the ASCII code of
each character in the file (printable and non-printable).
Your task will then be to convert your search word into ASCII,
and then loop through the array looking at nn elements at a time,
where nn would be the length of the word you are searching for
(with appropriate lead space and following space|punctuation, etc.).
HTH
Spoo
Thanks Spoo. That is the sort of info I wanted.
When I got to Get #1, , aaBDVid, it gave error 458 - Variable uses an Automation type not supported in Visual Basic
once opened for binary, you can also read the file using input or line input
depending on the size of your files i would just read entire file into a single string, then use instr to find if your search string was included in the file
i know thiswill work in text files containing end of file characters, and i have tested with word doc, but i can not promise that it will always find the correct answersvb Code:
open "Somefile.doc" for binary as 1 mystr = input(lof(1),#1) close 1 pos = instr(1, mystr, searchstr, vbtextcompare) if pos > 0 then msgbox "found in this file"
this being the case, the same code will work as you use for text files, except open for binary
Robert:
Here is a search function designed using the binary method. You may have to make some adaptations but this is the basic method for searching in binary mode:
The function returns true if found, false if not found.Code:Function InFileSearch(ByVal sFile As String, Optional ByVal str As String = "") As Boolean
On Error GoTo Errhandler
Dim f As Integer
Dim Buf As String
Dim BufLen As Long
Dim FoundPos As Long
'Make sure they entered a file and string to search
str = Trim$(str)
If str = "" Then Exit Function
If Trim$(sFile) = "" Then Exit Function
'case insensitive so make all lower case
str = LCase$(str)
'Open File for Binary Read
f = FreeFile
Open sFile For Binary Access Read As f
BufLen = LOF(f) 'FileLen(sFile)
If BufLen = 0 Then 'empty file so exit out
Close f
Exit Function
End If
'create buffer string to hold the file data
Buf = Space$(BufLen)
Get #f, , Buf
'look for case insensitive string
FoundPos = InStr(LCase$(Buf), LCase$(str))
If FoundPos Then
InFileSearch = True
Else
InFileSearch = False
End If
Close f
Buf = ""
Exit Function
Errhandler:
Close f
Buf = ""
MsgBox Error$, vbOKOnly, "Error"
End Function
Sorry, seems that the Dim statement in my earlier post was incomplete.Quote:
Originally Posted by RobertLees
I just checked that app and found that I did the following in the
Declarations section of the form:
The Get statement is the one that is populating the array for you, thusCode:Dim aaBDVid() As Byte
the array type is important.
I hope that does the trick
Spoo