Results 1 to 11 of 11

Thread: Replace string in a Huge file more > 4GB

  1. #1

    Thread Starter
    New Member
    Join Date
    Sep 2017
    Posts
    15

    Question Replace string in a Huge file more > 4GB

    Hi
    i need a solution for Replace some string with another string in a Huge file more > 4GB or maybe more and more ...
    if possible works fine with do event or another way for avoid from Not Responding problem.
    i found some threads in forum about reading larg files but i cant making currectly.
    thanks

  2. #2
    PowerPoster
    Join Date
    Jun 2015
    Posts
    2,224

    Re: Replace string in a Huge file more > 4GB

    post your code and I'm sure someone can modify / fix what you've tried.

  3. #3
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,852

    Re: Replace string in a Huge file more > 4GB

    Hi Astarali,

    It seems like part of your problem is actually "finding" the string. Here's a binary file search that I often use:

    Code:
    
    Option Explicit
    
    Public Function BinaryFileSearch(sFileSpec As String, sSearchString As String, Optional bCaseSensitive As Boolean = True, _
                                     Optional lStartPosition As Long = 1, Optional lFoundPosition As Long, _
                                     Optional lFileHandleToUse As Long = 0) As Boolean
        ' Returns true if sSearchString is found, else false.
        ' sSearchString can be no longer than 128.
        ' This will work even if Word or Excel has the file open.
        ' The lFoundPosition is a return argument.
        '    It returns the latest position before lStartPosition (if there isn't one after lStartPosition) or
        '    it returns the earliest position after lStartPosition.
        Dim iFle As Long
        Dim FileData As String
        Dim FilePointer As Long
        Dim FileLength As Long
        Dim sFind As String
        Dim iPos As Long
        '
        If Len(sSearchString) > 128 Then
            Err.Raise 1234
            Exit Function
        End If
        '
        If lFileHandleToUse = 0 Then
            If Not bFileExists(sFileSpec) Then Exit Function
            iFle = FreeFile
            On Error Resume Next
                Open sFileSpec For Binary As iFle
                If Err <> 0 Then
                    Close iFle
                    On Error GoTo 0
                    Exit Function
                End If
            On Error GoTo 0
            '
            If Len(iFle) = 0 Then Close iFle: Exit Function
        Else
            iFle = lFileHandleToUse ' The file MUST be opened BINARY for this to work.
        End If
        '
        If bCaseSensitive Then
            sFind = sSearchString
        Else
            sFind = LCase$(sSearchString)
        End If
        FileData = Space(1024)
        FileLength = LOF(iFle)
        FilePointer = lStartPosition
        Do
            If FilePointer > FileLength Then Exit Do
            Get iFle, FilePointer, FileData
            If Not bCaseSensitive Then FileData = LCase$(FileData)
            iPos = InStr(FileData, sFind)
            If iPos <> 0 Then
                lFoundPosition = FilePointer + iPos - 1
                If lFoundPosition >= lStartPosition Then
                    BinaryFileSearch = True
                    Exit Do
                End If
            End If
            FilePointer = ((FilePointer + 1024) - Len(sFind)) + 1
        Loop
        If lFileHandleToUse = 0 Then Close iFle
    End Function
    
    Public Function bFileExists(fle As String) As Boolean
        On Error GoTo FileExistsError
        ' If no error then something existed.
        bFileExists = (GetAttr(fle) And vbDirectory) = 0
        Exit Function
    FileExistsError:
        bFileExists = False
        Exit Function
    End Function
    
    
    Now, I'm sure there are better ones out there, as I'm doing a lot of converting binary to Unicode and vice-versa. That's certainly one improvement that could be made. However, I have great confidence that this one works.

    Now, after you've gotten your search string's position, it's then just an easy matter of re-opening your file with "Open sFileSpec For Binary", using your string's location and length, and moving things around a bit. If the search string and the replace string are the same size, you don't even have to do that.

    Good Luck,
    Elroy

    EDIT1: However, since your files are so large, you will have to be careful to move things around in "chunks". You don't want to blow up your memory.
    Last edited by Elroy; Dec 5th, 2017 at 02:46 PM.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  4. #4
    The Idiot
    Join Date
    Dec 2014
    Posts
    2,721

    Re: Replace string in a Huge file more > 4GB

    i would convert the string into binary (byte array) and do a instrB, this because we are looking in "HUGE" data files and binary search is faster.
    also, if you read in "steps", the string could be between two reads, that means you need to adjust the reading, to start more or less from the last point to include string inbetween.
    remember to convert the string, if low or upcase or unicode etc.
    if unknown, you also need to adjust the searching to convert ascii to lower or upcase and that will also slow down the searching.

  5. #5
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Replace string in a Huge file more > 4GB

    @Elroy. VB's file functions don't do 4GB files, correct?

    FYI to anyone, here's a class created by dilettante for accessing large files
    http://www.vbforums.com/showthread.p...File-I-O-Class
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  6. #6
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,852

    Re: Replace string in a Huge file more > 4GB

    Quote Originally Posted by baka View Post
    if you read in "steps", the string could be between two reads
    Hi Baka. My function actually deals with this.

    Quote Originally Posted by LaVolpe View Post
    @Elroy. VB's file functions don't do 4GB files, correct?
    Hi LaVolpe. Yep, this one I actually forgot about. Astarali, LaVolpe has some good file open/read routines for reading larger files. If I were doing this, I'd definitely check into his work.


    Take Care,
    Elroy

    EDIT1: And yeah, I already mentioned that I was doing some unnecessary Unicode conversion.
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  7. #7
    The Idiot
    Join Date
    Dec 2014
    Posts
    2,721

    Re: Replace string in a Huge file more > 4GB

    my post wasn't to complain about your post Elroy, I just wanted to mention how this can be done to the OP, and the use of InStrB, as I have worked on this myself a few times. in my own class I use API that LaVople pointed out, together with IsWindowUnicode, CreateFileA & W.

    also, searching strings, can be done in multiple ways,
    is theres a specific structure? example, is theres zeros before and after the string? using 00 before and after the string can fasten the search as you only get exactly that string and not a part of a text and word, like we search for "hole" but we also get result for "whole" and everything else containing "hole", but with "0"-hole-"0" we will only get hole and that it. or something else?
    you need to know, if unknown, then of course we do a simple string search and thats it. but if known, its better to follow the data structure.

    to search can also be done byte by byte, this if theres different conditions, we search for the first byte and then we compare with all the variations,
    example
    if data(1) = sdata(x+1) then
    if data(2) = sdata(x+2) or data(2) + 32 = sdata(x+2) then etc...

  8. #8
    PowerPoster Elroy's Avatar
    Join Date
    Jun 2014
    Location
    Near Nashville TN
    Posts
    9,852

    Re: Replace string in a Huge file more > 4GB

    Quote Originally Posted by baka View Post
    my post wasn't to complain about your post Elroy, I just wanted to mention how this can be done to the OP, and the use of InStrB, as I have worked on this myself a few times. in my own class I use API that LaVople pointed out, together with IsWindowUnicode, CreateFileA & W.

    also, searching strings, can be done in multiple ways,
    is theres a specific structure? example, is theres zeros before and after the string? using 00 before and after the string can fasten the search as you only get exactly that string and not a part of a text and word, like we search for "hole" but we also get result for "whole" and everything else containing "hole", but with "0"-hole-"0" we will only get hole and that it. or something else?
    you need to know, if unknown, then of course we do a simple string search and thats it. but if known, its better to follow the data structure.

    to search can also be done byte by byte, this if theres different conditions, we search for the first byte and then we compare with all the variations,
    example
    if data(1) = sdata(x+1) then
    if data(2) = sdata(x+2) or data(2) + 32 = sdata(x+2) then etc...
    Yeah, I didn't really take any offense. I was just pointing out for Astarali that that particular base had been covered.

    And yeah, there really is more to consider here than Astarali is suggesting. Is the string we're searching for Unicode? If so, is it UCS-2, or possibly the expanded UTF-16, or possibly some other variant of Unicode? If it's ANSI, do we need to worry about a particular codepage? Do we always want to find a terminating null? These are all unanswered questions.

    For me, I wrote that thing to search for ASCII strings primarily in Excel files. I just wanted a way to identify certain types of Excel files (created by my code, but with user-given names). It does a good job of that. However, I've never used it as a search-and-replace function.

    Best Regards,
    Elroy
    Any software I post in these forums written by me is provided "AS IS" without warranty of any kind, expressed or implied, and permission is hereby granted, free of charge and without restriction, to any person obtaining a copy. To all, peace and happiness.

  9. #9
    VB-aholic & Lovin' It LaVolpe's Avatar
    Join Date
    Oct 2007
    Location
    Beside Waldo
    Posts
    19,541

    Re: Replace string in a Huge file more > 4GB

    If I remember to check tomorrow, I can post some tidbits to help along. At my job, I developed a search (not replace) routine to find ASCII strings in very large files 4GB+ and it is pretty fast. As for replacing, that's relatively simple but requires double the disk space typically. Once string found, transfer all bytes up to that to another file, write the new string, and then transfer all bytes after the found string. Of course, if the string being replaced is larger than its replacement, one could write the changes to the same file and then truncate the file after shifting all remaining bytes. There's probably gotchas with truncating files, so I'll leave that to others for discussion.
    Insomnia is just a byproduct of, "It can't be done"

    Classics Enthusiast? Here's my 1969 Mustang Mach I Fastback. Her sister '67 Coupe has been adopted

    Newbie? Novice? Bored? Spend a few minutes browsing the FAQ section of the forum.
    Read the HitchHiker's Guide to Getting Help on the Forums.
    Here is the list of TAGs you can use to format your posts
    Here are VB6 Help Files online


    {Alpha Image Control} {Memory Leak FAQ} {Unicode Open/Save Dialog} {Resource Image Viewer/Extractor}
    {VB and DPI Tutorial} {Manifest Creator} {UserControl Button Template} {stdPicture Render Usage}

  10. #10
    The Idiot
    Join Date
    Dec 2014
    Posts
    2,721

    Re: Replace string in a Huge file more > 4GB

    you should be able to read and write simultaneously if the string is "exactly" the same size/length of the original.
    so, if you search for "wrong" you could replace with "false" as both are 5 letters without the need to create a new file.

  11. #11
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Replace string in a Huge file more > 4GB

    I didn't see where the character encoding was mentioned. Maybe the files are Unicode, EBCDIC, or who knows what?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width