Results 1 to 30 of 30

Thread: [RESOLVED] Is this word in this Word 2007 document ?

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Resolved [RESOLVED] Is this word in this Word 2007 document ?

    I have the path to a Word 2007 document, and a text string.

    How do I find out if that text string appears at least once as a whole word in that document.

  2. #2
    Ex-Super Mod RobDog888's Avatar
    Join Date
    Apr 2001
    Location
    LA, Calif. Raiders #1 AKA:Gangsta Yoda™
    Posts
    60,709

    Re: Is this word in this Word 2007 document ?

    Open the docx and do a .Find is one way but I know there is another way that doesnt require you to open the document. Have you tried a search yet?
    VB/Office Guru™ (AKA: Gangsta Yoda®)
    I dont answer coding questions via PM. Please post a thread in the appropriate forum.

    Microsoft MVP 2006-2011
    Office Development FAQ (C#, VB.NET, VB 6, VBA)
    Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
    If a post has helped you then Please Rate it!
    Reps & Rating PostsVS.NET on Vista Multiple .NET Framework Versions Office Primary Interop AssembliesVB/Office Guru™ Word SpellChecker™.NETVB/Office Guru™ Word SpellChecker™ VB6VB.NET Attributes Ex.Outlook Global Address ListAPI Viewer utility.NET API Viewer Utility
    System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6

  3. #3

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    ...I liked your second option.

    I tried to Open FileName.docx As Input As #1, but that was no good.

    I thought I could just cycle through each record and do an Instr on each record.

    Would opening as Binary work ?

  4. #4
    Software Carpenter dee-u's Avatar
    Join Date
    Feb 2005
    Location
    Pinas
    Posts
    11,127

    Re: Is this word in this Word 2007 document ?

    You could probably save a copy of that word file as RTF then load it in a RichTextBox control and do the searching there?
    Regards,


    As a gesture of gratitude please consider rating helpful posts. c",)

    Some stuffs: Mouse Hotkey | Compress file using SQL Server! | WPF - Rounded Combobox | WPF - Notify Icon and Balloon | NetVerser - a WPF chatting system

  5. #5

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    ...no I need to loop through one or more folders looking for Word files and searching each. For each file I need to look if one or more words appear, observing AND, OR and parentheses. I've coded that bit. I just need to find if a word is in a document.

  6. #6
    Addicted Member
    Join Date
    Jul 2007
    Posts
    228

    Re: Is this word in this Word 2007 document ?

    You are thinking on the right track. Opening the file can be done using the binary method and searching can be done using Instr.

    The real key to success in this type of program is how you code the compare routines. Things like case match, exact match, like operator matches, fuzzy searches, etc. are important. Also coding for various wildcard operators is important. To give you a small, tiny example... suppose you want to find a document with the word "man". Do you want a return for words like woman, Manfred Mann, Manitoba? What about a return for the word "men"? The list goes on and on.

    There are samples of program code around on various VB web sites. Off the top of my head I'm pretty sure Planet Source Code has some sample programs.

  7. #7

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    Thanks Tom. I had a look at Planet Source Code but couldn't find the sort of code I need - it might be there but its search facility is not good.

    I don't need any sort of fuzzy matching. Just 'is this word in this document"

  8. #8
    Addicted Member
    Join Date
    Jul 2007
    Posts
    228

    Re: Is this word in this Word 2007 document ?

    Hi Robert:

    I looked through my quickly looked through my snippets collection and found this. I assume you have the code to find the path/filename of the files you want to search. Once you have the list you could use this to search the files for your text matching. Important: It does require a reference to the Microsoft Scripting runtime.:

    Declarations:
    Code:
    Private Const ForReading = 1 'FileSystemObject constants
    Private Const ForWriting = 2
    Private Const ForAppending = 8
    
    Public FileList As New Collection 'List of files to search
    Public Results As New Collection 'Results filenames
    Public pos As New Collection 'Results position in file
    Code:
    Code:
    Public Sub AddFile(path As String, filename As String)
        'Add files to list. wildcards allowed
        Dim s As String
        s = Dir(path + filename)
        Do While s <> ""
            FileList.Add path + s, path + s
            s = Dir
        Loop
    End Sub
    
    Public Sub ClearFileList()
        'Clear the files list
        Dim i As Integer
        For i = 1 To FileList.Count
            FileList.Remove 1
        Next i
    End Sub
    
    Public Function Find(st As String) As Integer
        'Find st in the files listed. returns the number of results
        Dim tx As String, i As Integer
        Find = 0
        Set fso = CreateObject("Scripting.FileSystemObject")
        For i = 1 To Results.Count
            Results.Remove 1
        Next i
        For Each fn In FileList
            Set fil = fso.GetFile(fn)
            Set ts = fil.OpenAsTextStream(ForReading)
            tx = ts.ReadAll
            i = InStr(1, tx, st)
            Do While i > 0
                Find = Find + 1
                Results.Add fn
                pos.Add i
                i = InStr(i + 1, tx, st)
            Loop
            ts.Close
        Next fn
    End Function
    Hopefully that will get you started.

  9. #9
    Addicted Member
    Join Date
    Jul 2007
    Posts
    228

    Re: Is this word in this Word 2007 document ?

    Hey Robert:

    Go to this thread. Download the FindFiles.zip in post #3. This search class project may make your life a whole lot easier.

    http://www.vbforums.com/showthread.p...earch+Dir+text

    Tom

  10. #10

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    Thanks Tom. That code had 2 problems.

    1# my test string was in 2 files, but I only got 1 back.
    2# I need to find whole words - this code finds the requested string inside any text.

    The link seems to be more concerned with checking multiple files.

    I have done all the other work. All I need is this...

    I have the path of a .doc or .docx file, and a word. I just want a yes/no if that word appears at least once in the file.

    It will operate under Word 2007.
    The test should be case-insensitive - not a crucial issue
    The text should for a whole word. - this is crucial

  11. #11
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Is this word in this Word 2007 document ?

    record a macro of using search inside word, you can then adapt that code to the word object you are creating in a vb6 loop

    2# I need to find whole words - this code finds the requested string inside any text.
    to get whole words only put a space at the beginning and end of the search string, , though you will have a problem with punctuation marks if the search string is followed by one
    you could just put a space before the search string, then check if the next character after the length of the search string is a space or punctation
    Last edited by westconn1; Jan 4th, 2009 at 05:19 AM.
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  12. #12
    Ex-Super Mod RobDog888's Avatar
    Join Date
    Apr 2001
    Location
    LA, Calif. Raiders #1 AKA:Gangsta Yoda™
    Posts
    60,709

    Re: Is this word in this Word 2007 document ?

    Quote Originally Posted by RobertLees
    Thanks Tom. That code had 2 problems.

    1# my test string was in 2 files, but I only got 1 back.
    2# I need to find whole words - this code finds the requested string inside any text.

    The link seems to be more concerned with checking multiple files.

    I have done all the other work. All I need is this...

    I have the path of a .doc or .docx file, and a word. I just want a yes/no if that word appears at least once in the file.

    It will operate under Word 2007.
    The test should be case-insensitive - not a crucial issue
    The text should for a whole word. - this is crucial
    To open each document with Word and search will be slow (as I mentioned in my first post) but it will be the most accurate. By using the Word Object Model and Late Binding you will have an app that is more stable and supportive of multiple versions of Word.


    Edit: Record a macro
    VB/Office Guru™ (AKA: Gangsta Yoda®)
    I dont answer coding questions via PM. Please post a thread in the appropriate forum.

    Microsoft MVP 2006-2011
    Office Development FAQ (C#, VB.NET, VB 6, VBA)
    Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
    If a post has helped you then Please Rate it!
    Reps & Rating PostsVS.NET on Vista Multiple .NET Framework Versions Office Primary Interop AssembliesVB/Office Guru™ Word SpellChecker™.NETVB/Office Guru™ Word SpellChecker™ VB6VB.NET Attributes Ex.Outlook Global Address ListAPI Viewer utility.NET API Viewer Utility
    System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6

  13. #13
    Addicted Member
    Join Date
    Jul 2007
    Posts
    228

    Re: Is this word in this Word 2007 document ?

    The FindFiles .zip may be a bit over the top for what you want. Did you try the FileSystemObject code I posted? Find function has the code to search the file.

    I need to find whole words - this code finds the requested string inside any text
    This is exactly what I was referring to in my earlier post:
    ... suppose you want to find a document with the word "man". Do you want a return for words like woman, Manfred Mann, Manitoba?
    To avoid returning words like woman you could search for the string " man" with a leading space... but then it would find words like Manitoba. To avoid that your search string could then be " man " with a leading and trailing space. Now the problem would be if the word man was followed by a punctuation mark such as " man, ". The solution there is to check the character after the "n" in man. You also could search for with an OR factor, that is, if String = " man " Or String = " man." Or String = " man, ", etc.

    There are many ways to open the file and search as you have seen and now have code for... but unless you code the search for these considerations your search may be less than accurate. This is true for any search routine or program, including Word itself or Windows Search.

  14. #14

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    Initially I was testing with frm files in my development folder. Open strFileName For Input As #1 then read each record. Make the record lowercase, and replace each punctuation character with a space. Then do an instr to see if the whole word,case-insensitive exists. When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.

    I thought of opening as Binary which might work.

    Sure RobDog888 it would be slow opening each with Word, but I don't think there would be enough files for this to be a huge problem.

    Had to laugh at your comment 'multiple versions of Word". I am trying to replace the use of FileSearch which Microsoft dropped in 2007.

    I need to have a class that requires as little change to the app as possible. Therefore it is to obey the app's use of FileSearch properties. In doing this I can offer recognition of AND OR and parentheses.

    I've done all this. It works fine with text files.

    Word does this. I must be able to utilise this capability.

  15. #15
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Is this word in this Word 2007 document ?

    When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.
    opening for binary will probably avoid this
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  16. #16
    Ex-Super Mod RobDog888's Avatar
    Join Date
    Apr 2001
    Location
    LA, Calif. Raiders #1 AKA:Gangsta Yoda™
    Posts
    60,709

    Re: Is this word in this Word 2007 document ?

    Quote Originally Posted by RobertLees
    I am trying to replace the use of FileSearch which Microsoft dropped in 2007.
    FileSearch was never mentioned in this thread and if you did support multiple versions of Word you could use it if the user was using 2003 or earlier.
    VB/Office Guru™ (AKA: Gangsta Yoda®)
    I dont answer coding questions via PM. Please post a thread in the appropriate forum.

    Microsoft MVP 2006-2011
    Office Development FAQ (C#, VB.NET, VB 6, VBA)
    Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
    If a post has helped you then Please Rate it!
    Reps & Rating PostsVS.NET on Vista Multiple .NET Framework Versions Office Primary Interop AssembliesVB/Office Guru™ Word SpellChecker™.NETVB/Office Guru™ Word SpellChecker™ VB6VB.NET Attributes Ex.Outlook Global Address ListAPI Viewer utility.NET API Viewer Utility
    System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6

  17. #17
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Is this word in this Word 2007 document ?

    FileSearch was never mentioned in this thread and if y
    i noticed that, but there was someother thread by the op in office development, on this topic

    i guess filesearch would be way to do it without opening each file, though i guess they would be able to read the word file format easily enough, wonder why it is no longer featured in word
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  18. #18
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Is this word in this Word 2007 document ?

    Robert

    Have you tried something like...

    Code:
    Open worddoc For Input As #1
    Line Input #1
    .. where worddoc would be the full pathname of your Word doc.
    and the second statement would be used in a loop?

    I haven't tried that specifically with a Word doc, but do it all the time
    with .txt files. I did at least copy a Word doc and renamed it with a
    .txt extension, then did a simple search. Non-printable characters are
    skipped.. actual text is readable. I would imagine that Open..For Input
    would work in a similar manner (but I may be wrong).

    Spoo

  19. #19
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Is this word in this Word 2007 document ?

    Have you tried something like...
    see post #14
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  20. #20
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Is this word in this Word 2007 document ?

    Quote Originally Posted by westconn1
    see post #14
    Haha.. thanks. I need better glasses

  21. #21
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Is this word in this Word 2007 document ?

    Quote Originally Posted by RobertLees
    Initially I was testing with frm files in my development folder. Open strFileName For Input As #1 then read each record. Make the record lowercase, and replace each punctuation character with a space. Then do an instr to see if the whole word,case-insensitive exists. When I tried this with a doc file, it returned an early EOF - must have been something in the non-ASCII data.
    Hey, did you try this?

    1. copy the .doc file to a new folder
    2. rename .doc file to a .txt file
    3. do you still get a premature EOF ??

    Spoo

  22. #22
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Is this word in this Word 2007 document ?

    if the file contains and EOF character (ctrl Z) it will give read past end of file error no matter what file name /type, when opened for input, should work ok if opened for binary
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  23. #23
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Is this word in this Word 2007 document ?

    Good point.. that's what gave the binary approach the special sauce.

  24. #24

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    ...thanks everyone.

    I am pretty sure opening for binary would work, but I abandoned that approach because I thought there would be a way to use Word's inbuilt search facility.

    I inserted this ----- my PC uses (VISTA) OS ----- in a Word document, searched for vista, and I found it. I thought I was on the right track. BUT it also found ist.

    Back to the binary approach. I'll have to do a bit of research into this, but if someone can get me started with binary, it would be appreciated

  25. #25
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Is this word in this Word 2007 document ?

    Robert

    Something like this might do the trick:

    Code:
        Dim aaBDVid()
        PPath = "d:\bill's stuff\programs for bill\"
        ' 0. open file
        vname = "movie for rob.txt"
        vv = PPath + vname
        ' 1. use Get - create array
        Close #1
        Open vv For Binary As #1
        ReDim aaBDVid(FileLen(vv))
        Get #1, , aaBDVid
        Close #1
        '
    Sorry for obtuse names (it was to read an AVI video file),
    but the logic should be the same. Natch, change file names
    and extensions to meet your needs. Basically, you

    1. Dim an array
    2. Open the file as binary
    3. Dump the contents into the array (which will be a 1-D array)
    4. Close the file, and then work from the array

    The contents of the array will essentially be the ASCII code of
    each character in the file (printable and non-printable).

    Your task will then be to convert your search word into ASCII,
    and then loop through the array looking at nn elements at a time,
    where nn would be the length of the word you are searching for
    (with appropriate lead space and following space|punctuation, etc.).

    HTH
    Spoo
    Last edited by Spoo; Jan 5th, 2009 at 05:03 PM.

  26. #26

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    Thanks Spoo. That is the sort of info I wanted.

  27. #27

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2005
    Location
    Wellington, NZ
    Posts
    267

    Re: Is this word in this Word 2007 document ?

    When I got to Get #1, , aaBDVid, it gave error 458 - Variable uses an Automation type not supported in Visual Basic

  28. #28
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Is this word in this Word 2007 document ?

    once opened for binary, you can also read the file using input or line input

    depending on the size of your files i would just read entire file into a single string, then use instr to find if your search string was included in the file

    vb Code:
    1. open "Somefile.doc" for binary as 1
    2.   mystr = input(lof(1),#1)
    3. close 1
    4. pos = instr(1, mystr, searchstr, vbtextcompare)
    5. if pos > 0 then msgbox "found in this file"
    i know thiswill work in text files containing end of file characters, and i have tested with word doc, but i can not promise that it will always find the correct answers
    this being the case, the same code will work as you use for text files, except open for binary
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  29. #29
    Addicted Member
    Join Date
    Jul 2007
    Posts
    228

    Re: Is this word in this Word 2007 document ?

    Robert:

    Here is a search function designed using the binary method. You may have to make some adaptations but this is the basic method for searching in binary mode:
    Code:
    Function InFileSearch(ByVal sFile As String, Optional ByVal str As String = "") As Boolean
    
    On Error GoTo Errhandler
    Dim f As Integer
    Dim Buf As String
    Dim BufLen As Long
    Dim FoundPos As Long
    
       'Make sure they entered a file and string to search
        str = Trim$(str)
        If str = "" Then Exit Function
        If Trim$(sFile) = "" Then Exit Function
            
        'case insensitive so make all lower case
        str = LCase$(str)
        
        'Open File for Binary Read
        f = FreeFile
        Open sFile For Binary Access Read As f
        BufLen = LOF(f) 'FileLen(sFile)
        
        If BufLen = 0 Then 'empty file so exit out
            Close f
            Exit Function
        End If
        
        'create buffer string to hold the file data
         Buf = Space$(BufLen)
         Get #f, , Buf
         
         'look for case insensitive string
         FoundPos = InStr(LCase$(Buf), LCase$(str))
           If FoundPos Then
            InFileSearch = True
           Else
            InFileSearch = False
           End If
           
           Close f
           Buf = ""
    
        Exit Function
    
    Errhandler:
    
        Close f
        Buf = ""
        MsgBox Error$, vbOKOnly, "Error"
        
    
    
    End Function
    The function returns true if found, false if not found.

  30. #30
    PowerPoster Spoo's Avatar
    Join Date
    Nov 2008
    Location
    Right Coast
    Posts
    2,656

    Re: Is this word in this Word 2007 document ?

    Quote Originally Posted by RobertLees
    When I got to Get #1, , aaBDVid, it gave error 458 - Variable uses an Automation type not supported in Visual Basic
    Sorry, seems that the Dim statement in my earlier post was incomplete.
    I just checked that app and found that I did the following in the
    Declarations section of the form:

    Code:
    Dim aaBDVid() As Byte
    The Get statement is the one that is populating the array for you, thus
    the array type is important.

    I hope that does the trick
    Spoo
    Last edited by Spoo; Jan 6th, 2009 at 09:28 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width