Fastest way to search for text in a file.
I have a binary file that is about 20mb. I have a list of strings that I need to search if each exist in the binary file. I tried using INSTR with textcomparison(need to be case insensitive) but it is extremely slow.
BTW... Did do a search on this already but most are using INSTR with textcomparison.
Re: Fastest way to search for text in a file.
Instr will probably be your best choice in this case. I know it is pretty slow, but in this case there probably aren't many other choices.
Re: Fastest way to search for text in a file.
check out binary compare functions posted in codebank
Re: Fastest way to search for text in a file.
Thanks Mxjerrett and WestConn
Quote:
Originally Posted by westconn1
check out binary compare functions posted in codebank
West, several came up on the search. Only problem with binary is that it is case sensitive.
Re: Fastest way to search for text in a file.
Quote:
Originally Posted by Liquid Metal
Thanks Mxjerrett and WestConn
West, several came up on the search. Only problem with binary is that it is case sensitive.
UCase() or LCase() both strings before comparing
Re: Fastest way to search for text in a file.
Take a look at this function: InBArrBM
It is especially faster with TextCompare. It isn't optimal (I could make it better these days), but I'll throw a guess it is much better than InStr for what you're doing.
It uses Boyer-Moore to find stuff faster than the brute force search InStr does, that is why it can be faster than InStr thanks to a better algorithm.
Re: Fastest way to search for text in a file.
hi Merri,
I got it to work but not sure exactly what it is benchmarking. I believe it is benchmarking between the two command buttons and is the listbox contains data type? Can you explain a little bit about it?
Re: Fastest way to search for text in a file.
Quote:
Originally Posted by PMad
UCase() or LCase() both strings before comparing
:thumb: good idea and it worked. Only problem now is that I realized that I have to break up the file into small chunks in case it has to handle a super duper big file.
Thanks
Re: Fastest way to search for text in a file.
Well, ignore the benchmarker and just rip the function :D
Note that doing UCase$ or LCase$ to an entire massive file, be it in small or big chunks, is very slow.
Re: Fastest way to search for text in a file.
Quote:
Originally Posted by Merri
Note that doing UCase$ or LCase$ to an entire massive file, be it in small or big chunks, is very slow.
Completely agree.:thumb: Actually, just loading an entire file into memory is already bad enough. You know of an example to GET by chunk? I am tinkering with one right now but haven't able to get it to work yet.
Re: Fastest way to search for text in a file.
With a byte array you simply dimension the array to the chunk size, then keep reading the file until bytes to be read is smaller than the size of a chunk. If there is more than zero bytes to read, then resize the byte array to get the last bytes. You can use FileLen to get the length of a file into a variable.
Re: Fastest way to search for text in a file.
Can't remember where I ripped this code from but it was from one of the member here helping to shredd a file. I tweaked my code to my needs. Can you check and let me know the logic.
Code:
' Open file
Dim intFreeFile As Integer
intFreeFile = FreeFile
Open strShredFile For Binary As #intFreeFile
' Get total length
Dim lngLOF As Long
lngLOF = LOF(intFreeFile)
' Allocate buffer size for array byte
Const cintMAXSize As Integer = 1024 '* 4 '32& * 1024&
Dim intBufferSize As Integer
intBufferSize = IIf(lngLOF > cintMAXSize, cintMAXSize, lngLOF)
Dim byteArr() As Byte
Dim strData As String
Dim lngPos As Long
lngPos = 1
Seek #intFreeFile, 1
Do
' Allocate byteArr
Erase byteArr
If (lngPos + intBufferSize) >= lngLOF Then 'Test if Final looping - this need to be tested first - good methodology
ReDim byteArr(lngLOF - lngPos)
Else 'Continuous looping
ReDim byteArr(intBufferSize)
End If
Get #intFreeFile, lngPos, byteArr
strData = LCase(byteArr)
Dim intCounter As Integer
Dim intPos As Integer
Do
intPos = InStr(intPos + 1, strData, "winxml", vbBinaryCompare)
If intPos = 0 Then Exit Do
intCounter = intCounter + 1
Loop
' Return write position
lngPos = lngPos + intBufferSize - (Len("winxml") * 2)
Loop Until lngPos >= lngLOF
Close #intFreeFile