VB6 Search Box to search for text inside multiple word documents
Hi everyone, what i am trying to do is make a search box in vb6 to search for a text inside multiple word files in a folder and display the result(filenames for all the files containing the searched text) in a listbox or flexgrid. For now my code can seach for a single file and displays result as a message box.
But i don't want to open the files while they are being searched.
Can anyone help me achieve this more or less like a windows search box that searches for a "text or a phrase" in multiple files without opening them and dsiplays the results as file icons.
Moreover i would really appreciate if someone has a similar solution for .pdf files as well
thanx a big time!!!!
________________________________________________________
My code:
Private Sub Command1_Click()
Dim find As String
find = txtfind.Text
Set objWord = CreateObject("Word.Application")
objWord.Visible = True
Set objDoc = objWord.Documents.Open("C:\Hello.doc")
Set objSelection = objWord.Selection
objSelection.find.Text = find
objSelection.find.Forward = True
objSelection.find.MatchWholeWord = True
If objSelection.find.Execute Then
MsgBox "The search text was found."
Else
MsgBox "The search text was not found."
End If
End Sub
_______________________________________________________________
Re: VB6 Search Box to search for text inside multiple word documents
Here ya go buddy add a command button, textbox named txtDirectory and a listbox , in the textbox put the directory eg: C:\folder or whatever and where it says find what you can add another textbox an use a variable course you'd have to modify it a little to work as you need it but anyone else has suggestions should post em
Private Sub Command1_Click()
dir_name = txtDirectory.Text
If Right$(dir_name, 1) <> "\" Then dir_name = dir_name _
& "\"
file_name = Dir$(dir_name & "*.*")
Do While Len(file_name) > 0
pos = InStrRev(file_name, ".")
If pos = 0 Then
ext = ""
Else
ext = LCase$(Mid$(file_name, pos))
End If
Select Case ext
'include file extensions add your own
Case ".txt", ".rtf", ".doc"
Dim sSearch As String
Dim sName As String
Open dir_name & "\" & file_name For Input As #1
sSearch = Input(LOF(1), #1)
Close #1
sName = InStr(1, sSearch, "FINDWHAT")
If sName > 0 Then
List1.AddItem file_name
Else
Exit Sub
End If
End Select
GetNextFile:
file_name = Dir$()
Loop
End Sub
Re: VB6 Search Box to search for text inside multiple word documents
Thanx!! but when i try i get the following error:
Runtime error 62 "input past end of file"
Re: VB6 Search Box to search for text inside multiple word documents
that is appears to be normal when opening words docs for input
try opening for binary, i believe that would work based on couple of tests
see this thread doing the same thing http://www.vbforums.com/showthread.php?t=552279
i believe it will be difficult to find a word at the beginning or end of the document
Re: VB6 Search Box to search for text inside multiple word documents
thanx westconn1 i am trying to get my code right with the forum thread you mentioned. Actually it is not exactly what i want but yes there is a lot of valuable info i am going through.....cheers!
Re: VB6 Search Box to search for text inside multiple word documents
Quote:
Originally Posted by
Evilribbet
Here ya go buddy add a command button, textbox named txtDirectory and a listbox , in the textbox put the directory eg: C:\folder or whatever and where it says find what you can add another textbox an use a variable course you'd have to modify it a little to work as you need it but anyone else has suggestions should post em
awesome piece of code thank you ^_^ thou it doesnt work on any document built after office 2007 i mean docx or any X version of office any method to search on those as well? it seems they are some sorta zip files contain lots of xml files inside ...
Re: VB6 Search Box to search for text inside multiple word documents
you should be able to unzip then read the xml as above or using xml parser
Re: VB6 Search Box to search for text inside multiple word documents
Yes, you've already gotten some pretty good advice, and it does make a great deal of difference if we're talking about a .DOC or a .DOCX file.
Also, you say that you ...
Quote:
Originally Posted by
irfi
don't want to open the files while they are being searched.
I assume that you mean you don't want to use Word automation from VB6. You must be clear about those things. If you don't wish to open the files in any way, then I can't imagine how you could search them.
It's been a while since I've messed this but I have in the past. For the older .DOC files, I believe they put all of their text in the files as ASCII/ANSI (although the entire file is not ASCII/ANSI). Therefore, they can be searched directly, opening them as binary. Here's a binary file search routine I often use with Excel files (with the older .XLS files also placing text in the files as ASCII/ANSI). Also, this code has the advantage of being able to handle a file of any size, because it doesn't read it all into memory.
Code:
Option Explicit
Public Function BinaryFileSearch(sFileSpec As String, sSearchString As String, Optional bCaseSensitive As Boolean = True, _
Optional lStartPosition As Long = 1, Optional lFoundPosition As Long, _
Optional lFileHandleToUse As Long = 0) As Boolean
' Returns true if sSearchString is found, else false.
' sSearchString can be no longer than 128.
' This will work even if Word or Excel has the file open.
' The lFoundPosition is a return argument.
' It returns the latest position before lStartPosition (if there isn't one after lStartPosition) or
' it returns the earliest position after lStartPosition.
Dim iFle As Long
Dim FileData As String
Dim FilePointer As Long
Dim FileLength As Long
Dim sFind As String
Dim iPos As Long
'
If Len(sSearchString) > 128 Then
Error 1234
Exit Function
End If
'
If lFileHandleToUse = 0 Then
If Not bFileExists(sFileSpec) Then Exit Function
iFle = FreeFile
On Error Resume Next
Open sFileSpec For Binary As iFle
If Err <> 0 Then Close iFle: Exit Function
On Error GoTo 0
If Len(iFle) = 0 Then Close iFle: Exit Function
Else
iFle = lFileHandleToUse ' The file MUST be opened BINARY for this to work.
End If
'
If bCaseSensitive Then
sFind = sSearchString
Else
sFind = LCase$(sSearchString)
End If
FileData = Space(1024)
FileLength = LOF(iFle)
FilePointer = lStartPosition
Do
If FilePointer > FileLength Then Exit Do
Get iFle, FilePointer, FileData
If Not bCaseSensitive Then FileData = LCase$(FileData)
iPos = InStr(FileData, sFind)
If iPos <> 0 Then
lFoundPosition = FilePointer + iPos - 1
If lFoundPosition >= lStartPosition Then
BinaryFileSearch = True
Exit Do
End If
End If
FilePointer = ((FilePointer + 1024) - Len(sFind)) + 1
Loop
If lFileHandleToUse = 0 Then Close iFle
End Function
Public Function bFileExists(fle As String) As Boolean
On Error GoTo FileExistsError
' If no error then something existed.
bFileExists = (GetAttr(fle) And vbDirectory) = 0
Exit Function
FileExistsError:
bFileExists = False
Exit Function
End Function
Now, westconn1 already mentioned this, but the newer .DOCX files are actually a zipped collection of files. If you use something like 7zip, it's easy to see this. Just right-click the .DOCX file, say 7Zip / Open Archive, and they'll open right up. Once open, you'll want to go into the "word" folder and you'll see a document named "document.xml". That's an ANSI file that you can easily search. Therefore, with these newer .DOCX files, you've got the extra step of unzipping them and extracting this "document.xml" file into some temporary location.
Regarding the temporary location, I'd tend to use the API and get the Windows temp folder for this. Here's code to do that:
Code:
Option Explicit
'
Private Declare Function GetTempPath Lib "kernel32" Alias "GetTempPathA" (ByVal nBufferLength As Long, ByVal lpBuffer As String) As Long
'
Public Function WindowsTempPath()
Dim s As String
Const MAX_PATH = 1024 ' Maximum long filename path length
'
s = Space$(MAX_PATH)
GetTempPath MAX_PATH, s
s = Left$(s, InStr(s, vbNullChar) - 1)
If Right$(s, 1) <> "\" Then s = s & "\"
WindowsTempPath = s
End Function
Now, regarding unzipping a file, this isn't something I regularly do. I know that there are several threads around here that discuss this. And there's probably several threads in the VB6 codebank. However, here's a piece of code I've got in my VB6 junk-drawer. I can't say it's well tested, but here it is. Also, this requires a DLL reference to the "Microsoft Shell Controls And Automation" (shell32.dll). I'll leave you on your own to work that out, but it should be pretty easy. If you're on a 32-bit OS, that DLL should be in your Windows/System32 folder. If you're on a 64-bit OS, it should be in Windows/SysWOW64.
Code:
'
' Set reference to "Microsoft Shell Controls and Automation" to use this procedure!!!!!!!!!!
'
' Be aware when using the shell automation interface to unzip files as it
' leaves copies of the zip files in the temp directory (defined by %TEMP%).
' Folders named "Temporary Directory X for demo.zip" are generated where X
' is a sequential number from 1 - 99. When it reaches 99 you will then get
' a error dialog saying "The file exists" and it will not continue.
' I've no idea why Windows doesn't clean up after itself when unzipping files,
' but it is most annoying...
'
' Ziping a file using the Windows Shell API creates another thread where the zipping is executed.
' This means that it is possible that this app would end before the zipping thread
' starts to execute which would cause the zip to never occur and you will end up with just
' an empty zip file. So wait a second and give the zipping thread time to get started.
'
' If we're sure the program will continue then we can skip the pause.
' However, we must also be careful to not use unzipped files before they are fully unzipped.
'
Option Explicit
'
Public Enum ZipFlagsEnum
ZipFlagDefault = 0 ' Default. No options specified.
ZipFlagNoProgBar = 4 ' Do not display a progress dialog box.
ZipFlagRenameTarget = 8 ' Rename the target file if a file exists at the target location with the same name.
ZipFlagYesToAll = 16 ' Click "Yes to All" in any dialog box displayed.
ZipFlagPreserveUndo = 64 ' Preserve undo information, if possible.
ZipFlagOnlyWildcard = 128 ' Perform the operation only if a wildcard file name (*.*) is specified.
ZipFlagProgNoName = 256 ' Display a progress dialog box but do not show the file names.
ZipFlagNoConfirm = 512 ' Do not confirm the creation of a new directory if the operation requires one to be created.
ZipFlagNoInterface = 1024 ' Do not display a user interface if an error occurs.
ZipFlagNoSecurity = 2048 ' Do not copy security attributes.
ZipFlagNoRecursion = 4096 ' Disable recursion through sub-folders.
ZipFlagNoConnectedFiles = 8192 ' Do not copy connected files as a group. Only copy the specified files.
End Enum
'
Private Sub CreateEmptyZipFile(sZipFileSpec As String)
Dim EmptyZip() As Byte
Dim i As Long
Dim iFle As Long
'
' Create zip header.
ReDim EmptyZip(1 To 22)
EmptyZip(1) = 80
EmptyZip(2) = 75
EmptyZip(3) = 5
EmptyZip(4) = 6
For i = 5 To UBound(EmptyZip)
EmptyZip(i) = 0
Next
' Create empty zip file with header.
iFle = FreeFile
Open sZipFileSpec For Binary Access Write As #iFle
For i = LBound(EmptyZip) To UBound(EmptyZip)
Put #iFle, , EmptyZip(i)
Next
Close #iFle
End Sub
Public Sub AddFileToZipFile(sSourceFileSpec As String, sZipFileSpec As String, _
Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
' The zip file must already exist.
Dim ShellClass As Shell32.Shell
Dim FileDest As Shell32.Folder
'
Set ShellClass = New Shell32.Shell
Set FileDest = ShellClass.NameSpace(sZipFileSpec)
FileDest.CopyHere sSourceFileSpec, ZipFlags
End Sub
Public Sub AddFolderToZipFile(sSourceFolder As String, sZipFileSpec As String, _
Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
' The zip file must already exist.
' All sub-folders are added unless the ZipFlagNoRecursion flag is used.
Dim ShellClass As Shell32.Shell
Dim FileSource As Shell32.Folder
Dim FileDest As Shell32.Folder
Dim FolderItems As Shell32.FolderItems
'
Set ShellClass = New Shell32.Shell
Set FileSource = ShellClass.NameSpace(sSourceFolder)
Set FileDest = ShellClass.NameSpace(sZipFileSpec)
Set FolderItems = FileSource.Items
FileDest.CopyHere FolderItems, ZipFlags
End Sub
Public Sub ExtractZipFileToFolder(sZipFileSpec As String, sDestFolder As String, _
Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
' The zip file must already exist.
Dim ShellClass As Shell32.Shell
Dim FileSource As Shell32.Folder
Dim FileDest As Shell32.Folder
Dim FolderItems As Shell32.FolderItems
'
Set ShellClass = New Shell32.Shell
Set FileSource = ShellClass.NameSpace(sZipFileSpec)
Set FileDest = ShellClass.NameSpace(sDestFolder)
Set FolderItems = FileSource.Items
FileDest.CopyHere FolderItems, ZipFlags
End Sub
Now, you haven't specified how you're finding these Word documents. For simplicity, I'll assume that you know where they are. In that case, it's just some judicious use of the Dir$ function. If you don't know where they are, or if you wish to traverse all sub-folders, that's a more nettlesome problem. There's been some excellent code posted in the codebank lately regarding this. I've got my own, but I think the code in the codebank is better (or, at least, faster) than mine.
Good Luck,
Elroy
EDIT: Regarding PDF files, I'm less familiar with the internal format of those. However, that BinarySearch procedure I gave you may work. It's definitely worth a try. However, they may use a UTF-16 or UTF-8 storage method, which presents a whole other set of problems. Or, they may embed formatting in with the text, which also causes problems for doing searches, especially if/when parts of a word or sentence are formatted differently from other parts. Word used to do that many moons ago, but they quit doing it LONG ago. I'm not as positive about the XML documents, but, as westconn1 stated, it really shouldn't be much work to parse out (and throw away) all the XML codes, leaving just the text.
EDIT2: I've given you many (if not all) of the pieces, but it's up to you to fit them together.
EDIT3: It dawns on me that maybe you want to tap into the Windows search routines from VB6 using some kind of API calls. That's something I've never done, and would be interested to see how to do it. If that's what you're trying to do, I look forward to the replies of others.
Re: VB6 Search Box to search for text inside multiple word documents
Quote:
Also, you say that you ...
that was a long time ago
i only answered the recent addition
Re: VB6 Search Box to search for text inside multiple word documents
LOL, haha, I just realized that this was the resurrection of an old post.
Re: VB6 Search Box to search for text inside multiple word documents
Quote:
Originally Posted by
Elroy
Code:
' Be aware when using the shell automation interface to unzip files as it
' leaves copies of the zip files in the temp directory (defined by %TEMP%).
' Folders named "Temporary Directory X for demo.zip" are generated where X
' is a sequential number from 1 - 99. When it reaches 99 you will then get
' a error dialog saying "The file exists" and it will not continue.
' I've no idea why Windows doesn't clean up after itself when unzipping files,
' but it is most annoying...
The similar code here shows one possible way of removing those temporary files and folders.
Re: VB6 Search Box to search for text inside multiple word documents
Quote:
Originally Posted by Elroy
EDIT: Regarding PDF files, I'm less familiar with the internal format of those. However, that BinarySearch procedure I gave you may work.
PDF use a series of different compression methods, e.g. gzip + code to ASCII code mapping table. Also, it can be encrypted.
So, direct search willn't be working here.
You could use Adobe acrobat object model, but, unfortunately, X Pro version is a fee program.
There are many pdf parsers over internet. Personally, I not used them. E.g. console: pdftk + inkscape, pdfedit, and found this Xpdf, which also support decryption and written on c++, so it's possible to recompile it into dll to use with VB6.
Re: VB6 Search Box to search for text inside multiple word documents
You can also use Windows Search for this.
To search common file formats Windows already has a number of IFilters installed. These cover common image formats, text, and Word formats. If you want PDF searching most Acrobat Viewer packages for the last decade install a filter, and if you want formats like DOCX you can install 2007 Office System Converter: Microsoft Filter Pack.
Windows Search offers both high and low level APIs. I posted a demo a while back in Query Windows Search.
Within its limitations it offers a lot of power with minimal effort. There is no need to open and read and search tons of files yourself. Desktop search technologies are pretty mature at this late date and Windows has one built in. Unless your needs are specialized I'm not sure why you'd bother with anything else.
Re: VB6 Search Box to search for text inside multiple word documents
Quote:
Windows Search offers both high and low level APIs. I posted a demo a while back in Query Windows Search.
This is a very interesting application of the Windows Search.
I will take on board.
Regards,
Alex.