|
-
Aug 16th, 2016, 06:14 AM
#8
Re: VB6 Search Box to search for text inside multiple word documents
Yes, you've already gotten some pretty good advice, and it does make a great deal of difference if we're talking about a .DOC or a .DOCX file.
Also, you say that you ...
 Originally Posted by irfi
don't want to open the files while they are being searched.
I assume that you mean you don't want to use Word automation from VB6. You must be clear about those things. If you don't wish to open the files in any way, then I can't imagine how you could search them.
It's been a while since I've messed this but I have in the past. For the older .DOC files, I believe they put all of their text in the files as ASCII/ANSI (although the entire file is not ASCII/ANSI). Therefore, they can be searched directly, opening them as binary. Here's a binary file search routine I often use with Excel files (with the older .XLS files also placing text in the files as ASCII/ANSI). Also, this code has the advantage of being able to handle a file of any size, because it doesn't read it all into memory.
Code:
Option Explicit
Public Function BinaryFileSearch(sFileSpec As String, sSearchString As String, Optional bCaseSensitive As Boolean = True, _
Optional lStartPosition As Long = 1, Optional lFoundPosition As Long, _
Optional lFileHandleToUse As Long = 0) As Boolean
' Returns true if sSearchString is found, else false.
' sSearchString can be no longer than 128.
' This will work even if Word or Excel has the file open.
' The lFoundPosition is a return argument.
' It returns the latest position before lStartPosition (if there isn't one after lStartPosition) or
' it returns the earliest position after lStartPosition.
Dim iFle As Long
Dim FileData As String
Dim FilePointer As Long
Dim FileLength As Long
Dim sFind As String
Dim iPos As Long
'
If Len(sSearchString) > 128 Then
Error 1234
Exit Function
End If
'
If lFileHandleToUse = 0 Then
If Not bFileExists(sFileSpec) Then Exit Function
iFle = FreeFile
On Error Resume Next
Open sFileSpec For Binary As iFle
If Err <> 0 Then Close iFle: Exit Function
On Error GoTo 0
If Len(iFle) = 0 Then Close iFle: Exit Function
Else
iFle = lFileHandleToUse ' The file MUST be opened BINARY for this to work.
End If
'
If bCaseSensitive Then
sFind = sSearchString
Else
sFind = LCase$(sSearchString)
End If
FileData = Space(1024)
FileLength = LOF(iFle)
FilePointer = lStartPosition
Do
If FilePointer > FileLength Then Exit Do
Get iFle, FilePointer, FileData
If Not bCaseSensitive Then FileData = LCase$(FileData)
iPos = InStr(FileData, sFind)
If iPos <> 0 Then
lFoundPosition = FilePointer + iPos - 1
If lFoundPosition >= lStartPosition Then
BinaryFileSearch = True
Exit Do
End If
End If
FilePointer = ((FilePointer + 1024) - Len(sFind)) + 1
Loop
If lFileHandleToUse = 0 Then Close iFle
End Function
Public Function bFileExists(fle As String) As Boolean
On Error GoTo FileExistsError
' If no error then something existed.
bFileExists = (GetAttr(fle) And vbDirectory) = 0
Exit Function
FileExistsError:
bFileExists = False
Exit Function
End Function
Now, westconn1 already mentioned this, but the newer .DOCX files are actually a zipped collection of files. If you use something like 7zip, it's easy to see this. Just right-click the .DOCX file, say 7Zip / Open Archive, and they'll open right up. Once open, you'll want to go into the "word" folder and you'll see a document named "document.xml". That's an ANSI file that you can easily search. Therefore, with these newer .DOCX files, you've got the extra step of unzipping them and extracting this "document.xml" file into some temporary location.
Regarding the temporary location, I'd tend to use the API and get the Windows temp folder for this. Here's code to do that:
Code:
Option Explicit
'
Private Declare Function GetTempPath Lib "kernel32" Alias "GetTempPathA" (ByVal nBufferLength As Long, ByVal lpBuffer As String) As Long
'
Public Function WindowsTempPath()
Dim s As String
Const MAX_PATH = 1024 ' Maximum long filename path length
'
s = Space$(MAX_PATH)
GetTempPath MAX_PATH, s
s = Left$(s, InStr(s, vbNullChar) - 1)
If Right$(s, 1) <> "\" Then s = s & "\"
WindowsTempPath = s
End Function
Now, regarding unzipping a file, this isn't something I regularly do. I know that there are several threads around here that discuss this. And there's probably several threads in the VB6 codebank. However, here's a piece of code I've got in my VB6 junk-drawer. I can't say it's well tested, but here it is. Also, this requires a DLL reference to the "Microsoft Shell Controls And Automation" (shell32.dll). I'll leave you on your own to work that out, but it should be pretty easy. If you're on a 32-bit OS, that DLL should be in your Windows/System32 folder. If you're on a 64-bit OS, it should be in Windows/SysWOW64.
Code:
'
' Set reference to "Microsoft Shell Controls and Automation" to use this procedure!!!!!!!!!!
'
' Be aware when using the shell automation interface to unzip files as it
' leaves copies of the zip files in the temp directory (defined by %TEMP%).
' Folders named "Temporary Directory X for demo.zip" are generated where X
' is a sequential number from 1 - 99. When it reaches 99 you will then get
' a error dialog saying "The file exists" and it will not continue.
' I've no idea why Windows doesn't clean up after itself when unzipping files,
' but it is most annoying...
'
' Ziping a file using the Windows Shell API creates another thread where the zipping is executed.
' This means that it is possible that this app would end before the zipping thread
' starts to execute which would cause the zip to never occur and you will end up with just
' an empty zip file. So wait a second and give the zipping thread time to get started.
'
' If we're sure the program will continue then we can skip the pause.
' However, we must also be careful to not use unzipped files before they are fully unzipped.
'
Option Explicit
'
Public Enum ZipFlagsEnum
ZipFlagDefault = 0 ' Default. No options specified.
ZipFlagNoProgBar = 4 ' Do not display a progress dialog box.
ZipFlagRenameTarget = 8 ' Rename the target file if a file exists at the target location with the same name.
ZipFlagYesToAll = 16 ' Click "Yes to All" in any dialog box displayed.
ZipFlagPreserveUndo = 64 ' Preserve undo information, if possible.
ZipFlagOnlyWildcard = 128 ' Perform the operation only if a wildcard file name (*.*) is specified.
ZipFlagProgNoName = 256 ' Display a progress dialog box but do not show the file names.
ZipFlagNoConfirm = 512 ' Do not confirm the creation of a new directory if the operation requires one to be created.
ZipFlagNoInterface = 1024 ' Do not display a user interface if an error occurs.
ZipFlagNoSecurity = 2048 ' Do not copy security attributes.
ZipFlagNoRecursion = 4096 ' Disable recursion through sub-folders.
ZipFlagNoConnectedFiles = 8192 ' Do not copy connected files as a group. Only copy the specified files.
End Enum
'
Private Sub CreateEmptyZipFile(sZipFileSpec As String)
Dim EmptyZip() As Byte
Dim i As Long
Dim iFle As Long
'
' Create zip header.
ReDim EmptyZip(1 To 22)
EmptyZip(1) = 80
EmptyZip(2) = 75
EmptyZip(3) = 5
EmptyZip(4) = 6
For i = 5 To UBound(EmptyZip)
EmptyZip(i) = 0
Next
' Create empty zip file with header.
iFle = FreeFile
Open sZipFileSpec For Binary Access Write As #iFle
For i = LBound(EmptyZip) To UBound(EmptyZip)
Put #iFle, , EmptyZip(i)
Next
Close #iFle
End Sub
Public Sub AddFileToZipFile(sSourceFileSpec As String, sZipFileSpec As String, _
Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
' The zip file must already exist.
Dim ShellClass As Shell32.Shell
Dim FileDest As Shell32.Folder
'
Set ShellClass = New Shell32.Shell
Set FileDest = ShellClass.NameSpace(sZipFileSpec)
FileDest.CopyHere sSourceFileSpec, ZipFlags
End Sub
Public Sub AddFolderToZipFile(sSourceFolder As String, sZipFileSpec As String, _
Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
' The zip file must already exist.
' All sub-folders are added unless the ZipFlagNoRecursion flag is used.
Dim ShellClass As Shell32.Shell
Dim FileSource As Shell32.Folder
Dim FileDest As Shell32.Folder
Dim FolderItems As Shell32.FolderItems
'
Set ShellClass = New Shell32.Shell
Set FileSource = ShellClass.NameSpace(sSourceFolder)
Set FileDest = ShellClass.NameSpace(sZipFileSpec)
Set FolderItems = FileSource.Items
FileDest.CopyHere FolderItems, ZipFlags
End Sub
Public Sub ExtractZipFileToFolder(sZipFileSpec As String, sDestFolder As String, _
Optional ZipFlags As ZipFlagsEnum = ZipFlagNoProgBar + ZipFlagYesToAll + ZipFlagNoConfirm + ZipFlagNoInterface)
' The zip file must already exist.
Dim ShellClass As Shell32.Shell
Dim FileSource As Shell32.Folder
Dim FileDest As Shell32.Folder
Dim FolderItems As Shell32.FolderItems
'
Set ShellClass = New Shell32.Shell
Set FileSource = ShellClass.NameSpace(sZipFileSpec)
Set FileDest = ShellClass.NameSpace(sDestFolder)
Set FolderItems = FileSource.Items
FileDest.CopyHere FolderItems, ZipFlags
End Sub
Now, you haven't specified how you're finding these Word documents. For simplicity, I'll assume that you know where they are. In that case, it's just some judicious use of the Dir$ function. If you don't know where they are, or if you wish to traverse all sub-folders, that's a more nettlesome problem. There's been some excellent code posted in the codebank lately regarding this. I've got my own, but I think the code in the codebank is better (or, at least, faster) than mine.
Good Luck,
Elroy
EDIT: Regarding PDF files, I'm less familiar with the internal format of those. However, that BinarySearch procedure I gave you may work. It's definitely worth a try. However, they may use a UTF-16 or UTF-8 storage method, which presents a whole other set of problems. Or, they may embed formatting in with the text, which also causes problems for doing searches, especially if/when parts of a word or sentence are formatted differently from other parts. Word used to do that many moons ago, but they quit doing it LONG ago. I'm not as positive about the XML documents, but, as westconn1 stated, it really shouldn't be much work to parse out (and throw away) all the XML codes, leaving just the text.
EDIT2: I've given you many (if not all) of the pieces, but it's up to you to fit them together.
EDIT3: It dawns on me that maybe you want to tap into the Windows search routines from VB6 using some kind of API calls. That's something I've never done, and would be interested to see how to do it. If that's what you're trying to do, I look forward to the replies of others.
Last edited by Elroy; Aug 16th, 2016 at 06:34 AM.
Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|