Results 1 to 14 of 14

Thread: VB6 Search Box to search for text inside multiple word documents

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Nov 2008
    Posts
    170

    VB6 Search Box to search for text inside multiple word documents

    Hi everyone, what i am trying to do is make a search box in vb6 to search for a text inside multiple word files in a folder and display the result(filenames for all the files containing the searched text) in a listbox or flexgrid. For now my code can seach for a single file and displays result as a message box.

    But i don't want to open the files while they are being searched.

    Can anyone help me achieve this more or less like a windows search box that searches for a "text or a phrase" in multiple files without opening them and dsiplays the results as file icons.

    Moreover i would really appreciate if someone has a similar solution for .pdf files as well

    thanx a big time!!!!
    ________________________________________________________
    My code:

    Private Sub Command1_Click()
    Dim find As String
    find = txtfind.Text
    Set objWord = CreateObject("Word.Application")
    objWord.Visible = True
    Set objDoc = objWord.Documents.Open("C:\Hello.doc")
    Set objSelection = objWord.Selection
    objSelection.find.Text = find
    objSelection.find.Forward = True
    objSelection.find.MatchWholeWord = True
    If objSelection.find.Execute Then
    MsgBox "The search text was found."
    Else
    MsgBox "The search text was not found."
    End If
    End Sub
    _______________________________________________________________

  2. #2
    Member
    Join Date
    Jun 2007
    Posts
    53

    Re: VB6 Search Box to search for text inside multiple word documents

    Here ya go buddy add a command button, textbox named txtDirectory and a listbox , in the textbox put the directory eg: C:\folder or whatever and where it says find what you can add another textbox an use a variable course you'd have to modify it a little to work as you need it but anyone else has suggestions should post em

    Private Sub Command1_Click()

    dir_name = txtDirectory.Text
    If Right$(dir_name, 1) <> "\" Then dir_name = dir_name _
    & "\"
    file_name = Dir$(dir_name & "*.*")
    Do While Len(file_name) > 0
    pos = InStrRev(file_name, ".")
    If pos = 0 Then
    ext = ""
    Else
    ext = LCase$(Mid$(file_name, pos))
    End If

    Select Case ext
    'include file extensions add your own
    Case ".txt", ".rtf", ".doc"
    Dim sSearch As String
    Dim sName As String

    Open dir_name & "\" & file_name For Input As #1
    sSearch = Input(LOF(1), #1)
    Close #1
    sName = InStr(1, sSearch, "FINDWHAT")
    If sName > 0 Then
    List1.AddItem file_name
    Else
    Exit Sub
    End If
    End Select
    GetNextFile:
    file_name = Dir$()
    Loop
    End Sub
    Last edited by Evilribbet; Jan 9th, 2009 at 04:24 PM.

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Nov 2008
    Posts
    170

    Re: VB6 Search Box to search for text inside multiple word documents

    Thanx!! but when i try i get the following error:
    Runtime error 62 "input past end of file"

  4. #4
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: VB6 Search Box to search for text inside multiple word documents

    that is appears to be normal when opening words docs for input
    try opening for binary, i believe that would work based on couple of tests

    see this thread doing the same thing http://www.vbforums.com/showthread.php?t=552279

    i believe it will be difficult to find a word at the beginning or end of the document
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  5. #5

    Thread Starter
    Addicted Member
    Join Date
    Nov 2008
    Posts
    170

    Re: VB6 Search Box to search for text inside multiple word documents

    thanx westconn1 i am trying to get my code right with the forum thread you mentioned. Actually it is not exactly what i want but yes there is a lot of valuable info i am going through.....cheers!

  6. #6
    Junior Member
    Join Date
    Aug 2015
    Posts
    27

    Re: VB6 Search Box to search for text inside multiple word documents

    Quote Originally Posted by Evilribbet View Post
    Here ya go buddy add a command button, textbox named txtDirectory and a listbox , in the textbox put the directory eg: C:\folder or whatever and where it says find what you can add another textbox an use a variable course you'd have to modify it a little to work as you need it but anyone else has suggestions should post em
    awesome piece of code thank you ^_^ thou it doesnt work on any document built after office 2007 i mean docx or any X version of office any method to search on those as well? it seems they are some sorta zip files contain lots of xml files inside ...

  7. #7
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: VB6 Search Box to search for text inside multiple word documents

    you should be able to unzip then read the xml as above or using xml parser
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  8. #8
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,913

    Re: VB6 Search Box to search for text inside multiple word documents

    Yes, you've already gotten some pretty good advice, and it does make a great deal of difference if we're talking about a .DOC or a .DOCX file.

    Also, you say that you ...

    Quote Originally Posted by irfi View Post
    don't want to open the files while they are being searched.
    I assume that you mean you don't want to use Word automation from VB6. You must be clear about those things. If you don't wish to open the files in any way, then I can't imagine how you could search them.


    It's been a while since I've messed this but I have in the past. For the older .DOC files, I believe they put all of their text in the files as ASCII/ANSI (although the entire file is not ASCII/ANSI). Therefore, they can be searched directly, opening them as binary. Here's a binary file search routine I often use with Excel files (with the older .XLS files also placing text in the files as ASCII/ANSI). Also, this code has the advantage of being able to handle a file of any size, because it doesn't read it all into memory.

    Code:
    Option Explicit
    
    Public Function BinaryFileSearch(sFileSpec As String, sSearchString As String, Optional bCaseSensitive As Boolean = True, _
                                     Optional lStartPosition As Long = 1, Optional lFoundPosition As Long, _
                                     Optional lFileHandleToUse As Long = 0) As Boolean
        ' Returns true if sSearchString is found, else false.
        ' sSearchString can be no longer than 128.
        ' This will work even if Word or Excel has the file open.
        ' The lFoundPosition is a return argument.
        '    It returns the latest position before lStartPosition (if there isn't one after lStartPosition) or
        '    it returns the earliest position after lStartPosition.
        Dim iFle As Long
        Dim FileData As String
        Dim FilePointer As Long
        Dim FileLength As Long
        Dim sFind As String
        Dim iPos As Long
        '
        If Len(sSearchString) > 128 Then
            Error 1234
            Exit Function
        End If
        '
        If lFileHandleToUse = 0 Then
            If Not bFileExists(sFileSpec) Then Exit Function
            iFle = FreeFile
            On Error Resume Next
            Open sFileSpec For Binary As iFle
            If Err <> 0 Then Close iFle: Exit Function
            On Error GoTo 0
            If Len(iFle) = 0 Then Close iFle: Exit Function
        Else
            iFle = lFileHandleToUse ' The file MUST be opened BINARY for this to work.
        End If
        '
        If bCaseSensitive Then
            sFind = sSearchString
        Else
            sFind = LCase$(sSearchString)
        End If
        FileData = Space(1024)
        FileLength = LOF(iFle)
        FilePointer = lStartPosition
        Do
            If FilePointer > FileLength Then Exit Do
            Get iFle, FilePointer, FileData
            If Not bCaseSensitive Then FileData = LCase$(FileData)
            iPos = InStr(FileData, sFind)
            If iPos <> 0 Then
                lFoundPosition = FilePointer + iPos - 1
                If lFoundPosition >= lStartPosition Then
                    BinaryFileSearch = True
                    Exit Do
                End If
            End If
            FilePointer = ((FilePointer + 1024) - Len(sFind)) + 1
        Loop
        If lFileHandleToUse = 0 Then Close iFle
    End Function
    
    Public Function bFileExists(fle As String) As Boolean
        On Error GoTo FileExistsError
        ' If no error then something existed.
        bFileExists = (GetAttr(fle) And vbDirectory) = 0
        Exit Function
    FileExistsError:
        bFileExists = False
        Exit Function
    End Function

    Now, westconn1 already mentioned this, but the newer .DOCX files are actually a zipped collection of files. If you use something like 7zip, it's easy to see this. Just right-click the .DOCX file, say 7Zip / Open Archive, and they'll open right up. Once open, you'll want to go into the "word" folder and you'll see a document named "document.xml". That's an ANSI file that you can easily search. Therefore, with these newer .DOCX files, you've got the extra step of unzipping them and extracting this "document.xml" file into some temporary location.

    Regarding the temporary location, I'd tend to use the API and get the Windows temp folder for this. Here's code to do that:

    Code:
    Option Explicit
    '
    Private Declare Function GetTempPath Lib "kernel32" Alias "GetTempPathA" (ByVal nBufferLength As Long, ByVal lpBuffer As String) As Long
    '
    
    Public Function WindowsTempPath()
        Dim s As String
        Const MAX_PATH = 1024 ' Maximum long filename path length
        '
        s = Space$(MAX_PATH)
        GetTempPath MAX_PATH, s
        s = Left$(s, InStr(s, vbNullChar) - 1)
        If Right$(s, 1) <> "\" Then s = s & "\"
        WindowsTempPath = s
    End Function
    Now, regarding unzipping a file, this isn't something I regularly do. I know that there are several threads around here that discuss this. And there's probably several threads in the VB6 codebank. However, here's a piece of code I've got in my VB6 junk-drawer. I can't say it's well tested, but here it is. Also, this requires a DLL reference to the "Microsoft Shell Controls And Automation" (shell32.dll). I'll leave you on your own to work that out, but it should be pretty easy. If you're on a 32-bit OS, that DLL should be in your Windows/System32 folder. If you're on a 64-bit OS, it should be in Windows/SysWOW64.

    Code:
    '
    ' Set reference to "Microsoft Shell Controls and Automation" to use this procedure!!!!!!!!!!
    '
    ' Be aware when using the shell automation interface to unzip files as it
    ' leaves copies of the zip files in the temp directory (defined by %TEMP%).
    ' Folders named "Temporary Directory X for demo.zip" are generated where X
    ' is a sequential number from 1 - 99.  When it reaches 99 you will then get
    ' a error dialog saying "The file exists" and it will not continue.
    ' I've no idea why Windows doesn't clean up after itself when unzipping files,
    ' but it is most annoying...
    '
    ' Ziping a file using the Windows Shell API creates another thread where the zipping is executed.
    ' This means that it is possible that this app would end before the zipping thread
    ' starts to execute which would cause the zip to never occur and you will end up with just
    ' an empty zip file. So wait a second and give the zipping thread time to get started.
    '
    ' If we're sure the program will continue then we can skip the pause.
    ' However, we must also be careful to not use unzipped files before they are fully unzipped.
    '
    Option Explicit
    '
    Public Enum ZipFlagsEnum
        ZipFlagDefault = 0              ' Default. No options specified.
        ZipFlagNoProgBar = 4            ' Do not display a progress dialog box.
        ZipFlagRenameTarget = 8         ' Rename the target file if a file exists at the target location with the same name.
        ZipFlagYesToAll = 16            ' Click "Yes to All" in any dialog box displayed.
        ZipFlagPreserveUndo = 64        ' Preserve undo information, if possible.
        ZipFlagOnlyWildcard = 128       ' Perform the operation only if a wildcard file name (*.*) is specified.
        ZipFlagProgNoName = 256         ' Display a progress dialog box but do not show the file names.
        ZipFlagNoConfirm = 512          ' Do not confirm the creation of a new directory if the operation requires one to be created.
        ZipFlagNoInterface = 1024       ' Do not display a user interface if an error occurs.
        ZipFlagNoSecurity = 2048        ' Do not copy security attributes.
        ZipFlagNoRecursion = 4096       ' Disable recursion through sub-folders.
        ZipFlagNoConnectedFiles = 8192  ' Do not copy connected files as a group. Only copy the specified files.
    End Enum
    '
    
    Private Sub CreateEmptyZipFile(sZipFileSpec As String)
        Dim EmptyZip() As Byte
        Dim i As Long
        Dim iFle As Long
        '
        ' Create zip header.
        ReDim EmptyZip(1 To 22)
        EmptyZip(1) = 80
        EmptyZip(2) = 75
        EmptyZip(3) = 5
        EmptyZip(4) = 6
        For i = 5 To UBound(EmptyZip)
            EmptyZip(i) = 0
        Next
        ' Create empty zip file with header.
        iFle = FreeFile
        Open sZipFileSpec For Binary Access Write As #iFle
        For i = LBound(EmptyZip) To UBound(EmptyZip)
            Put #iFle, , EmptyZip(i)
        Next
        Close #iFle
    End Sub
    
    Public Sub AddFileToZipFile(sSourceFileSpec As String, sZipFileSpec As String, _
                                Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
        ' The zip file must already exist.
        Dim ShellClass  As Shell32.Shell
        Dim FileDest    As Shell32.Folder
        '
        Set ShellClass = New Shell32.Shell
        Set FileDest = ShellClass.NameSpace(sZipFileSpec)
        FileDest.CopyHere sSourceFileSpec, ZipFlags
    End Sub
    
    Public Sub AddFolderToZipFile(sSourceFolder As String, sZipFileSpec As String, _
                                  Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
        ' The zip file must already exist.
        ' All sub-folders are added unless the ZipFlagNoRecursion flag is used.
        Dim ShellClass  As Shell32.Shell
        Dim FileSource  As Shell32.Folder
        Dim FileDest    As Shell32.Folder
        Dim FolderItems As Shell32.FolderItems
        '
        Set ShellClass = New Shell32.Shell
        Set FileSource = ShellClass.NameSpace(sSourceFolder)
        Set FileDest = ShellClass.NameSpace(sZipFileSpec)
        Set FolderItems = FileSource.Items
        FileDest.CopyHere FolderItems, ZipFlags
    End Sub
    
    Public Sub ExtractZipFileToFolder(sZipFileSpec As String, sDestFolder As String, _
                                      Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
        ' The zip file must already exist.
        Dim ShellClass  As Shell32.Shell
        Dim FileSource  As Shell32.Folder
        Dim FileDest    As Shell32.Folder
        Dim FolderItems As Shell32.FolderItems
        '
        Set ShellClass = New Shell32.Shell
        Set FileSource = ShellClass.NameSpace(sZipFileSpec)
        Set FileDest = ShellClass.NameSpace(sDestFolder)
        Set FolderItems = FileSource.Items
        FileDest.CopyHere FolderItems, ZipFlags
    End Sub
    Now, you haven't specified how you're finding these Word documents. For simplicity, I'll assume that you know where they are. In that case, it's just some judicious use of the Dir$ function. If you don't know where they are, or if you wish to traverse all sub-folders, that's a more nettlesome problem. There's been some excellent code posted in the codebank lately regarding this. I've got my own, but I think the code in the codebank is better (or, at least, faster) than mine.

    Good Luck,
    Elroy

    EDIT: Regarding PDF files, I'm less familiar with the internal format of those. However, that BinarySearch procedure I gave you may work. It's definitely worth a try. However, they may use a UTF-16 or UTF-8 storage method, which presents a whole other set of problems. Or, they may embed formatting in with the text, which also causes problems for doing searches, especially if/when parts of a word or sentence are formatted differently from other parts. Word used to do that many moons ago, but they quit doing it LONG ago. I'm not as positive about the XML documents, but, as westconn1 stated, it really shouldn't be much work to parse out (and throw away) all the XML codes, leaving just the text.

    EDIT2: I've given you many (if not all) of the pieces, but it's up to you to fit them together.

    EDIT3: It dawns on me that maybe you want to tap into the Windows search routines from VB6 using some kind of API calls. That's something I've never done, and would be interested to see how to do it. If that's what you're trying to do, I look forward to the replies of others.
    Last edited by Elroy; Aug 16th, 2016 at 06:34 AM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  9. #9
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: VB6 Search Box to search for text inside multiple word documents

    Also, you say that you ...
    that was a long time ago
    i only answered the recent addition
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  10. #10
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    10,913

    Re: VB6 Search Box to search for text inside multiple word documents

    LOL, haha, I just realized that this was the resurrection of an old post.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  11. #11
    Default Member Bonnie West's Avatar
    Join Date
    Jun 2012
    Location
    InIDE
    Posts
    4,060

    Re: VB6 Search Box to search for text inside multiple word documents

    Quote Originally Posted by Elroy View Post
    Code:
    ' Be aware when using the shell automation interface to unzip files as it
    ' leaves copies of the zip files in the temp directory (defined by %TEMP%).
    ' Folders named "Temporary Directory X for demo.zip" are generated where X
    ' is a sequential number from 1 - 99.  When it reaches 99 you will then get
    ' a error dialog saying "The file exists" and it will not continue.
    ' I've no idea why Windows doesn't clean up after itself when unzipping files,
    ' but it is most annoying...
    The similar code here shows one possible way of removing those temporary files and folders.
    On Local Error Resume Next: If Not Empty Is Nothing Then Do While Null: ReDim i(True To False) As Currency: Loop: Else Debug.Assert CCur(CLng(CInt(CBool(False Imp True Xor False Eqv True)))): Stop: On Local Error GoTo 0
    Declare Sub CrashVB Lib "msvbvm60" (Optional DontPassMe As Any)

  12. #12
    Member Dragokas's Avatar
    Join Date
    Aug 2015
    Location
    Ukraine on fire (country of slaves)
    Posts
    750

    Re: VB6 Search Box to search for text inside multiple word documents

    Quote Originally Posted by Elroy
    EDIT: Regarding PDF files, I'm less familiar with the internal format of those. However, that BinarySearch procedure I gave you may work.
    PDF use a series of different compression methods, e.g. gzip + code to ASCII code mapping table. Also, it can be encrypted.
    So, direct search willn't be working here.

    You could use Adobe acrobat object model, but, unfortunately, X Pro version is a fee program.

    There are many pdf parsers over internet. Personally, I not used them. E.g. console: pdftk + inkscape, pdfedit, and found this Xpdf, which also support decryption and written on c++, so it's possible to recompile it into dll to use with VB6.
    Malware analyst, VirusNet developer, HiJackThis+ author || my CodeBank works

  13. #13
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: VB6 Search Box to search for text inside multiple word documents

    You can also use Windows Search for this.

    To search common file formats Windows already has a number of IFilters installed. These cover common image formats, text, and Word formats. If you want PDF searching most Acrobat Viewer packages for the last decade install a filter, and if you want formats like DOCX you can install 2007 Office System Converter: Microsoft Filter Pack.

    Windows Search offers both high and low level APIs. I posted a demo a while back in Query Windows Search.


    Within its limitations it offers a lot of power with minimal effort. There is no need to open and read and search tons of files yourself. Desktop search technologies are pretty mature at this late date and Windows has one built in. Unless your needs are specialized I'm not sure why you'd bother with anything else.

  14. #14
    Member Dragokas's Avatar
    Join Date
    Aug 2015
    Location
    Ukraine on fire (country of slaves)
    Posts
    750

    Re: VB6 Search Box to search for text inside multiple word documents

    Windows Search offers both high and low level APIs. I posted a demo a while back in Query Windows Search.
    This is a very interesting application of the Windows Search.
    I will take on board.

    Regards,
    Alex.
    Malware analyst, VirusNet developer, HiJackThis+ author || my CodeBank works

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width