Results 1 to 35 of 35

Thread: Find a string in a text file Fast....

  1. #1

    Thread Starter
    Frenzied Member Jmacp's Avatar
    Join Date
    Jul 2003
    Location
    UK
    Posts
    1,959

    Find a string in a text file Fast....

    This demonstrates a faster way to search any file and pull a full line from that file given a search string, than the string buffer way. Uses an API method to open a file and store it in a buffer then instr'gs though the buffer to find a match, also compares this to the ordinary string buffer method, API wins by a factor of about 3.5 for a 1 meg text file which is not bad.

    Could also open the file as binary and use instr, didn't bother with it though..
    Attached Files Attached Files

  2. #2
    I'm about to be a PowerPoster! Hack's Avatar
    Join Date
    Aug 2001
    Location
    Searching for mendhak
    Posts
    58,333

    Re: Find a string in a text file Fast....

    Fine job and I needed this today. It came in handy!

  3. #3
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    I wrote a competing API code that does things slightly differently. Instead of using StrConv for all the data, it just reads it into a string variable directly. I also removed the complex InStr + InStrRev code and just put in InStr vs. InStrB with a simple string search for "abc".

    With a 11.8 MB file memory usage with my function was around 12 - 13 MB. Jmacp's original method jumped at around 36 - 37 MB memory usage.

    On the speed side differences are greater: my code is some 5 - 6 times faster. The difference becomes greater as the file size grows.


    Just a reminder of the byte versions of string functions: InStrB, LeftB$, MidB$, RightB$, LenB, ChrB$, AscB. That would be "true" binary file handling using string functions, as no textual conversion takes place. Also if you're reading UTF-8 data it is much more straightforward to pass this kind of a string to Windows string conversion function and get out a string that is ready-to-use in VB6.

    Edit!
    If you also need an ANSI Split, kinda like SplitB, see my QuickSplitB sub.
    Attached Files Attached Files
    Last edited by Merri; Sep 20th, 2008 at 03:12 PM.

  4. #4

    Thread Starter
    Frenzied Member Jmacp's Avatar
    Join Date
    Jul 2003
    Location
    UK
    Posts
    1,959

    Re: Find a string in a text file Fast....

    I should have said that my code obviously wasn't polished up. I was just throwing in some idea's, the whole instr, instrrev part was just to get to the end point quickly, but the API ReadFile, CreateFile was the real substance, i am sure your version is better, well done!

  5. #5
    New Member
    Join Date
    Nov 2008
    Posts
    2

    Re: Find a string in a text file Fast....

    Hello Merri,
    I integrated your modified function into my application and it's by far the fastest file search I've used and makes the application perfectly usable now. It took just 4 minutes to scoot through a total of 6GB of CSV data. My question to you is: is it possible to use your function in conjunction with a regular expression pattern instead of a fixed search string, and still maintain its performance?

  6. #6
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    That would depend on the regular expression mechanism: if you can pass a pointer to a buffer in memory and if you can do continuous calls without causing regular expression pattern to be analyzed each time separately, then you could achieve pretty good speeds.

    In comparison if you'd need to pass "normal" strings and needed a conversion, that would cause a massive amount of extra work, a bit like how Jmacp's original code is when compared to what I did.


    It is all about keeping data unmodified as much as possible.
    Last edited by Merri; Nov 23rd, 2008 at 04:55 PM.

  7. #7
    New Member
    Join Date
    Nov 2008
    Posts
    2

    Re: Find a string in a text file Fast....

    If anyone has the time to investigate this for me, the attached test project will save some time.

    Thanks.
    Attached Files Attached Files
    Last edited by stakemaster; Nov 29th, 2008 at 07:17 PM.

  8. #8
    Frenzied Member
    Join Date
    Apr 2003
    Location
    The Future - Skynet
    Posts
    1,157

    Re: Find a string in a text file Fast....

    Merri, your method works great but for some reason, it craps out on files that are 500,000 kb? You have any idea?
    I'll Be Back!

    T-1000

    Microsoft .Net 2005
    Microsoft Visual Basic 6
    Prefer using API

  9. #9
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    If you are reading files that big you may wish to consider using some level of buffering instead of reading the whole file to memory at once.

  10. #10
    Frenzied Member
    Join Date
    Apr 2003
    Location
    The Future - Skynet
    Posts
    1,157

    Re: Find a string in a text file Fast....

    Quote Originally Posted by Merri View Post
    If you are reading files that big you may wish to consider using some level of buffering instead of reading the whole file to memory at once.
    Agree but can your code still be applied? It seems that your api opens the whole file at once.

    Is it similar to this one:
    http://www.vbforums.com/showpost.php...0&postcount=12

    If yes, can you help me clean it up? I would really appreciate your help on this.
    I'll Be Back!

    T-1000

    Microsoft .Net 2005
    Microsoft Visual Basic 6
    Prefer using API

  11. #11
    Frenzied Member
    Join Date
    Apr 2003
    Location
    The Future - Skynet
    Posts
    1,157

    Re: Find a string in a text file Fast....

    Hi Merri, I got your code to work on big files. It does run faster then any of the other method but there is a memory leak.

    If you run it in Excel VBA, you can seek the memory keeps accumulating in the task manager.
    I'll Be Back!

    T-1000

    Microsoft .Net 2005
    Microsoft Visual Basic 6
    Prefer using API

  12. #12
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    The leak is probably introduced in your way of using the code. For example, the PutMem4 part of the code places the created string into a string variable. If you do this in a loop and never use vbNullString to the string variable you never free the strings from memory and thus you keep on hogging more memory.

  13. #13
    Frenzied Member
    Join Date
    Apr 2003
    Location
    The Future - Skynet
    Posts
    1,157

    Re: Find a string in a text file Fast....

    Quote Originally Posted by Merri View Post
    The leak is probably introduced in your way of using the code. For example, the PutMem4 part of the code places the created string into a string variable. If you do this in a loop and never use vbNullString to the string variable you never free the strings from memory and thus you keep on hogging more memory.
    I believe that is accurate. Is there an api to free up the memory? When I set the string to = "", then it clears it out but has to be within the loop.
    I'll Be Back!

    T-1000

    Microsoft .Net 2005
    Microsoft Visual Basic 6
    Prefer using API

  14. #14
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    You can't use "" because that allocates an empty string. You must use vbNullString. In this case the use of vbNullString is faster than using an API call.

    Note that you can also just create the buffer once and keep filling it again and again, you don't need to create the buffer over and over again. Clearing up the buffer with vbNullString would be good practice (that was left out from that example... and it should have more comments).

  15. #15
    Frenzied Member
    Join Date
    Apr 2003
    Location
    The Future - Skynet
    Posts
    1,157

    Re: Find a string in a text file Fast....

    I did what you recommended and updated the "" to vbNullString. That was a good recommendation.

    I did what you said and only create the buffer once by moving the stringAlloc api out of the loop. Some how, that created gibberish for my data. Was I supposed to move the PutMem4 out of the loop too?
    I'll Be Back!

    T-1000

    Microsoft .Net 2005
    Microsoft Visual Basic 6
    Prefer using API

  16. #16
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    Yes, the string allocation and PutMem4 should always go together and be as close to each other as possible. So if you move one you must move the other. Also, regarding the end of file when buffer will be larger than the remaining file, you must decrease the buffer size. Easiest way and probably the fastest is to use LeftB$.

  17. #17
    New Member
    Join Date
    Feb 2010
    Posts
    7

    Re: Find a string in a text file Fast....

    Hi Merri, thx for your post. It's very fast.

    I've a problem, because if i search a txt file for IB, and the txt file contains GOTLIB, then it would find it. Is there some way where i only get a positive hit if the file contailns IB?

  18. #18
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    To keep it relatively fast you need to code additional conditions once you have found a match. Basically you check the character before and after the match to see whether it is or isn't something you want to be there. If the characters are what you don't want then you search again.

    Alternatively, if there is always a specific character before and after the string to be found, such as line change, then you can simply include them in your search.

  19. #19
    New Member
    Join Date
    Feb 2010
    Posts
    7

    Re: Find a string in a text file Fast....

    Thx merri for your reply.

    There is not any specific way to identify the words, and i do therefore have a mdb file with over 10000 words, that the program should go through. I take the program 30 min. to end, and im therefore i search of some code, that can reduce that amount of time

  20. #20
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    Open up a thread in the classic VB and post some of the code you use. People are probably able to tell you about the issues that are in your existing code, in best case it is just a few small things that need to be changed to improve speed to bearable levels. Also, try to tell what is the information you want to have, ie. do you just have to know that the word is in the file or is there something more.

  21. #21
    New Member
    Join Date
    Mar 2010
    Location
    Reading, UK
    Posts
    2

    Exclamation Re: Find a string in a text file Fast....

    Merri, thanks for code. I don't really understand it but I have cribbed your project and put one line:
    Label2.Caption = InStrB(API_Merri, Find)
    , into a loop, with the Find string being read from a file. I have output the time it takes to process each batch of 1000 searches. This shows that the search gets slower and slower.
    1-1000, 2 secs
    1000-2000 3 secs
    2000 - 3000 4 secs etc etc.
    The length of the Find string does not change. I have found that if I remove the InStrB search or if the Find string is a constant, the speed does not deteriorate. The speed improves with shorter Find strings. Whether the Find string is found or not makes no difference. Also the memory useage does not increase.
    Any ideas would be greatly appreciated.

    Here's my code, thanks in advance:
    Code:
    FileNo = FreeFile
    Open TESTFILE For Input As #FileNo
    StartTime = Now()
    i = 0
    Do While Not EOF(FileNo)
        Line Input #FileNo, Find
        Find = StrConv(Find, vbFromUnicode)
        Label2.Caption = InStrB(API_Merri, Find)
        i = i + 1
        If Int(i / 1000) = i / 1000 Then
            Debug.Print i & " - " & Format(Now() - StartTime, "hh:mm:ss")
            StartTime = Now()
        End If
    Loop
    Close #FileNo

  22. #22
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    If the result to find is further down in the file you search from, then it will take longer to find. In this case, if it is likely your later search keywords are down to the end of the searched file in general, then finding does take longer.

  23. #23
    New Member
    Join Date
    Mar 2010
    Location
    Reading, UK
    Posts
    2

    Re: Find a string in a text file Fast....

    Yup, spot on, I reversed the order of one of the files and the search started slow and got faster and faster...cheers, invaluable help.

  24. #24
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    Note that this also may mean there is a spot for further optimization. If the keywords are always found in the order from file you don't necessarily need to search from the beginning of the file, instead simply continue from the last position. Or, if it possible to sort the keywords into such order that they're found in order from the file.

    On the other news, as things keep getting faster you don't want to update Label2.Caption on each loop iteration, because interacting with controls is slow. It may seem small, but in reality a lot of happens each time you change something in a control (drawing to screen, string storage etc).

  25. #25
    Fanatic Member coolcurrent4u's Avatar
    Join Date
    Apr 2008
    Location
    *****
    Posts
    993

    Re: Find a string in a text file Fast....

    Merri, i like you code, and want to use it in some project. But i wan multiple keyword search. can i do this without looping, as i have so many keyword i want to search for at once, you can also let me know the performance issue i should be expecting
    Last edited by coolcurrent4u; Feb 4th, 2011 at 07:24 AM.
    Programming is all about good logic. Spend more time here


    (Generate pronounceable password) (Generate random number c#) (Filter array with another array)

  26. #26
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    It would require more complex code than that, can't use InStr because it always looks for a single given keyword. You'd be forced to multiple loops through it all.

    To make it more efficient and to truly loop through just once you'd need to 1) sort the keywords 2) do string matching manually against the keyword list 3) as the keyword list is sorted, it will be quite fast to know whether you've found what you're looking for, you don't need to check againt all the strings, just go on until you have either a perfect match or only a partial match and the next keyword can't match. Finally 4) applying some string finding algorithm such as binary search should make things quite fast and those require the keywords to be sorted. You'll have only a couple of lookups from the keywords list instead of going through all the keywords. That is the power of sorting & a good search algorithm.

  27. #27
    Fanatic Member coolcurrent4u's Avatar
    Join Date
    Apr 2008
    Location
    *****
    Posts
    993

    Re: Find a string in a text file Fast....

    i developed some code shown in my signature (Filter array with another array), do you tin i can apply the same technique without much looping.
    Programming is all about good logic. Spend more time here


    (Generate pronounceable password) (Generate random number c#) (Filter array with another array)

  28. #28
    Lively Member
    Join Date
    Mar 2007
    Location
    Illinois, USA
    Posts
    85

    Re: Find a string in a text file Fast....

    Merri

    I'm getting a 'Run-time error 9 subscript out of range' as indicated in the below code:
    Code:
    Public Function ApiReadFile(ByVal strFilename As String, ByVal strStringToFind As String) As String
        Dim hFile As Long, bContent() As Byte
        Dim FileLenght As Long, Result As Long
        
        hFile = CreateFile(strFilename, GENERIC_READ, FILE_SHARE_READ Or FILE_SHARE_WRITE, ByVal 0&, OPEN_EXISTING, 0, 0)
        FileLenght = GetFileSize(hFile, 0)
        
        SetFilePointer hFile, 0, 0, FILE_BEGIN
        
        ReDim bContent(1 To FileLenght) As Byte    '<<--- Error 9
        
        ReadFile hFile, bContent(1), UBound(bContent), Result, ByVal 0&
        If Result <> UBound(bContent) Then MsgBox "Error reading file ..."
        
        CloseHandle hFile
        
        ApiReadFile = StrConv(bContent, vbUnicode)
        
        Label1.Caption = InStr(ApiReadFile, strStringToFind) 'Mid(ApiReadFile, InStrRev(ApiReadFile, vbNewLine, InStr(1, ApiReadFile, strStringToFind)), (InStr(InStr(1, ApiReadFile, strStringToFind), ApiReadFile, vbNewLine)) - InStrRev(ApiReadFile, vbNewLine, InStr(1, ApiReadFile, strStringToFind)))
        
        ReDim bContent(0) As Byte
        
    End Function
    I have not altered you project, just testing it.

  29. #29
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    What is the size of the tested file? If it is 0 bytes then the code fails as it does not check if the length is valid.

    In the other hand I can't recall whether the code created a file or not, if it does create a file then make sure the project is located in a folder that you have write access to (Vista & 7 aren't as "nice" as XP is on file permissions).

  30. #30
    Lively Member
    Join Date
    Mar 2007
    Location
    Illinois, USA
    Posts
    85

    Re: Find a string in a text file Fast....

    I manually created the file:
    Code:
    Const TESTFILE = "C:\Test.txt"
    It contains 4 bytes.
    Same error.

    The app didn't create it.

  31. #31
    PowerPoster Nightwalker83's Avatar
    Join Date
    Dec 2001
    Location
    Adelaide, Australia
    Posts
    13,344

    Re: Find a string in a text file Fast....

    @ Merri

    I noticed if I create the file "C:\Test.txt" and leave it empty then attempt to use your code I receive a message box saying:

    File too big: 0.000 gigabytes. Shouldn't that be file too small?

    Quote Originally Posted by Aaron02 View Post
    I manually created the file:
    Code:
    Const TESTFILE = "C:\Test.txt"
    It contains 4 bytes.
    Same error.

    The app didn't create it.
    Did you write something in the file then save it?
    when you quote a post could you please do it via the "Reply With Quote" button or if it multiple post click the "''+" button then "Reply With Quote" button.
    If this thread is finished with please mark it "Resolved" by selecting "Mark thread resolved" from the "Thread tools" drop-down menu.
    https://get.cryptobrowser.site/30/4111672

  32. #32
    Lively Member
    Join Date
    Mar 2007
    Location
    Illinois, USA
    Posts
    85

    Re: Find a string in a text file Fast....

    It contains the word Test

  33. #33
    VB6, XHTML & CSS hobbyist Merri's Avatar
    Join Date
    Oct 2002
    Location
    Finland
    Posts
    6,654

    Re: Find a string in a text file Fast....

    Aaron02: now that I had time to download the sample I noticed that the referred code is jmacp's original code and you get "subscript out of range" error if the file is not there. My code has a check for valid handle and it tells it could not find the file, so clicking the second button first should tell you this.

    Nightwalker83: the bug is there but considering the nature of the sample it shouldn't matter that much: it is quite a small change to fix the problem and a rewrite is required for use in other purposes.

  34. #34
    New Member
    Join Date
    Apr 2011
    Posts
    1

    Re: Find a string in a text file Fast....

    Hello Merri,
    Is there a NOT CASE SENSITIVE version of API_Merri?
    Thx in advance, RJ

  35. #35
    New Member
    Join Date
    Sep 2012
    Posts
    1

    Re: Find a string in a text file Fast....

    Hi all

    Needs to be in VB6

    How do you get it to continue to find the last find entry (currently only find the first instance)
    ie to shows the last GeccountId for "30180AV" which should be 0001769011GjRH9BkX58roI7eAmCurl6G6q7C5yYzJ6lwS21oJ

    the only way i can think of is loading it as a text file as an string array. then reversein the sting array with another loop
    then searching

    But there must be an easier way

    Thanks


    ilarge txt file which has lay out similar to this plus other stiff in

    2012-09-25 15:01:03,421 INFO - 25 September 2012 15:01:03.421 +01:00 : [0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0] POST - GetAccountId(0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0) returning "19678JU"
    2012-09-25 15:02:47,093 INFO - 25 September 2012 15:02:47.093 +01:00 : [0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0] POST - GetAccountId(0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0) returning "19678JU"
    2012-09-25 15:02:53,468 INFO - 25 September 2012 15:02:53.468 +01:00 : [0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0] POST - GetAccountId(0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0) returning "19678JU"
    2012-09-25 15:03:00,250 INFO - 25 September 2012 15:03:00.250 +01:00 : [0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0] POST - GetAccountId(0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0) returning "19678JU"
    2012-09-25 15:03:27,656 INFO - 25 September 2012 15:03:27.656 +01:00 : [0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0] POST - GetAccountId(0001768998r8Vfb4B6ZajavFqNMWZwSbm6QsfJuLwVoHcRIWx0) returning "19678JU"
    2012-09-25 15:05:17,265 INFO - 25 September 2012 15:05:17.265 +01:00 : [0001769003uH4h8ZWtQEur2kyZbBeaRA10W0FC8L1Da6ifaAB8] POST - GetAccountId(0001769003uH4h8ZWtQEur2kyZbBeaRA10W0FC8L1Da6ifaAB8) returning "30180AV"
    2012-09-25 15:12:50,734 INFO - 25 September 2012 15:12:50.734 +01:00 : [0001769003uH4h8ZWtQEur2kyZbBeaRA10W0FC8L1Da6ifaAB8] POST - GetAccountId(0001769003uH4h8ZWtQEur2kyZbBeaRA10W0FC8L1Da6ifaAB8) returning "30180AV"
    2012-09-25 15:31:06,703 INFO - 25 September 2012 15:31:06.703 +01:00 : [0001769011GjRH9BkX58roI7eAmCurl6G6q7C5yYzJ6lwS21oJ] POST - GetAccountId(0001769011GjRH9BkX58roI7eAmCurl6G6q7C5yYzJ6lwS21oJ) returning "30180AV"
    2012-09-25 15:31:57,250 INFO - 25 September 2012 15:31:57.250 +01:00 : [0001769013Lpp8tvJXEadblzvvdyLy1CMWHQcV7fduWVjHGz4n] POST - GetAccountId(0001769013Lpp8tvJXEadblzvvdyLy1CMWHQcV7fduWVjHGz4n) returning "30180AV"
    2012-09-25 15:34:35,593 INFO - 25 September 2012 15:34:35.593 +01:00 : [0001769009A3cuBbG20shtUgPv8y5uBBgTbfCc8k19BedVN1dL] POST - GetAccountId(0001769009A3cuBbG20shtUgPv8y5uBBgTbfCc8k19BedVN1dL) returning "91896YY"
    2012-09-25 15:34:41,828 INFO - 25 September 2012 15:34:41.828 +01:00 : [0001769009A3cuBbG20shtUgPv8y5uBBgTbfCc8k19BedVN1dL] POST - GetAccountId(0001769009A3cuBbG20shtUgPv8y5uBBgTbfCc8k19BedVN1dL) returning "91896YY"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width