dcsimg
Results 1 to 11 of 11

Thread: ReadFileDirectlyToString

  1. #1

    Thread Starter
    New Member
    Join Date
    Jul 2019
    Posts
    6

    ReadFileDirectlyToString

    Hi everybody,

    I am in a problem: I need to read a huge amount of files and compare them with other almost identical files from backups.
    There are some 7+ terabytes.

    I have done long time ago a program which is doing this job well, and quite fast, but i need it faster.
    And I need it to be aware of Unicode filenames and there are also huge files, >2/4GB. The old program is using VB6 interfaces, does not know Unicode filename (i got a workaround using 8.3 filenames provided by the system for the files), and is working; but not for huge files.

    Anyway...

    I just need some instructions or a piece of code which can read from file directly to a string (lets say i.e. a 256kB string); not using a byte array to read file into, because this byte array needs anyway to be transferred to a string for string compare...

    I am aware of these:
    Code:
        lSuccess = ReadFile(FileNumber, bbyte(0), lBytesToRead, lBytesRead, 0&)
    i used them;

    To read directly file into a string, I don't know how to do it to work. I tried calling it using StrPtr, also directly with string name, but is not working.

    Please help...

  2. #2
    PowerPoster
    Join Date
    Sep 2006
    Location
    Egypt
    Posts
    2,557

    Re: ReadFileDirectlyToString

    If you are interested only to know whether the two files are identical without needing to know what and/or where is the difference then you can simply create the hash code of the two file and compare that hash code.



  3. #3

    Thread Starter
    New Member
    Join Date
    Jul 2019
    Posts
    6

    Re: ReadFileDirectlyToString

    Thanks for answering,
    Yes, i'm interested only to see if they are identical or not.

    But hash does not involve lots of overhead code (calculations, etc) instead of just compare two blocks of memory (string compare is amazingly fast)? and btw, I have no idea what to use for making hash. Of course i can do a search, but i just think is anyway slower to do two hashes instead of doing strings compare.

    Update:
    I did some workaround yesterday night, using a known method: copymemory
    Call CopyMemory(ByVal string1, bytearray1(0), BuffLen)
    to transfer bytearrays to strings; because comparing bytearrays is pathetically slow; this works pretty fast (i tested with 1MB and 32MB buffers), but still involve this operation in itself, which if could be avoided, will be even better.

  4. #4
    Fanatic Member 2kaud's Avatar
    Join Date
    May 2014
    Location
    England
    Posts
    615

    Re: ReadFileDirectlyToString

    Have you tried using one of the command line tools for file compare - such as fc?
    All advice is offered in good faith only. You are ultimately responsible for the effects of your programs and the integrity of the machines they run on. Anything I post, code snippets, advice, etc is licensed as Public Domain https://creativecommons.org/publicdomain/zero/1.0/

    C++17 Compiler: Microsoft VS2019 (16.2.2)

  5. #5
    Frenzied Member PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Newport, UK
    Posts
    1,061

    Re: ReadFileDirectlyToString

    Quote Originally Posted by 4x2y View Post
    If you are interested only to know whether the two files are identical without needing to know what and/or where is the difference then you can simply create the hash code of the two file and compare that hash code.
    Generating a hash can be more expensive compared to doing a straight string comparison if you are only comparing something once. IF you are comparing a file with multiple other files or comparing multiple files with each other then the cost of hashing a file once is much less than comparing the files time and time again.

    Then again you could short circuit some of the comparison by checking file sizes, if they are different then no need to compare contents. If you know the internal structures of the files then you might be able to identify byte offsets that are likely to be changed rather than comparing entire files.

    Also Memory Mapped Files might help here http://www.vbforums.com/showthread.p...pped-File-Demo for vb6 or https://docs.microsoft.com/en-us/dot...y-mapped-files for vb.net might be worth looking at.

  6. #6

    Thread Starter
    New Member
    Join Date
    Jul 2019
    Posts
    6

    Re: ReadFileDirectlyToString

    Quote Originally Posted by 2kaud View Post
    Have you tried using one of the command line tools for file compare - such as fc?
    Hi, thanks for tip; not applicabile, because I need each file from path 1 who matches with other in path 2 to move (as soon as it maches because of large number of files involved) into other folder (path 3) preserving the initial tree structure.


    @PlausiblyDamp:
    i'm doing comparison only once.
    yes, first time I check file sizes.
    no, the internal structure of the files does not matter, files should be identical but might change in time (some years since copied) due to corruption (hardware or software); this is why i need to verify it; this data is simply a lot of backups and media files copied in a NAS and five other external harddrives (most of data is twice only, except a few gig of very important data that is in three).

    I'm thinking that for future to do a hash of every file, or a crc32/64, to help with maintain the archive easier.

  7. #7
    Addicted Member
    Join Date
    Aug 2017
    Posts
    190

    Re: ReadFileDirectlyToString

    Try this routine:

    Code:
    Private Declare Function CloseHandle Lib "kernel32.dll" (ByVal hObject As Long) As Long
    Private Declare Function CreateFileW Lib "kernel32.dll" (ByVal lpFileName As Long, ByVal dwDesiredAccess As Long, ByVal dwShareMode As Long, ByVal lpSecurityAttributes As Long, ByVal dwCreationDisposition As Long, ByVal dwFlagsAndAttributes As Long, Optional ByVal hTemplateFile As Long) As Long
    Private Declare Function GetFileSizeEx Lib "kernel32.dll" (ByVal hFile As Long, ByRef lpFileSize As Currency) As Long
    Private Declare Function ReadFile Lib "kernel32.dll" (ByVal hFile As Long, ByRef lpBuffer As Any, ByVal nNumberOfBytesToRead As Long, Optional ByRef lpNumberOfBytesRead As Long, Optional ByVal lpOverlapped As Long) As Long
    Private Declare Function RtlCompareMemory Lib "ntdll.dll" (ByRef Source1 As Any, ByRef Source2 As Any, ByVal Length As Long) As Long
    Private Declare Function SetFilePointerEx Lib "kernel32.dll" (ByVal hFile As Long, ByVal liDistanceToMove As Currency, Optional ByRef lpNewFilePointer As Currency, Optional ByVal dwMoveMethod As Long) As Long
    
    Public Function IsSameFile(ByRef FileName1 As String, ByRef FileName2 As String, Optional ByVal BufferSize As Long = &H100000) As Boolean
        Const FALSE_ = 0&, FILE_CURRENT = 1&, FILE_FLAG_SEQUENTIAL_SCAN = &H8000000, FILE_SHARE_READ = 1&
        Const GENERIC_READ = &H80000000, INVALID_HANDLE_VALUE = (-1&), NULL_ = 0&, OPEN_EXISTING = 3&
        Dim Buffer1() As Byte, BytesRead1 As Long, hFile1 As Long, FileSize1 As Currency, FilePointer As Currency
        Dim Buffer2() As Byte, BytesRead2 As Long, hFile2 As Long, FileSize2 As Currency, CompareMemoryFailed As Boolean
    
        hFile1 = CreateFileW(StrPtr(FileName1), GENERIC_READ, FILE_SHARE_READ, NULL_, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN)
    
        If hFile1 <> INVALID_HANDLE_VALUE Then
            hFile2 = CreateFileW(StrPtr(FileName2), GENERIC_READ, FILE_SHARE_READ, NULL_, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN)
    
            If hFile2 <> INVALID_HANDLE_VALUE Then
                If GetFileSizeEx(hFile1, FileSize1) Then
                    If GetFileSizeEx(hFile2, FileSize2) Then
                        If FileSize1 = FileSize2 Then
                            ReDim Buffer1(0& To BufferSize - 1&) As Byte
                            ReDim Buffer2(0& To BufferSize - 1&) As Byte
    
                            Do
                                If ReadFile(hFile1, Buffer1(0&), BufferSize, BytesRead1) <> FALSE_ And BytesRead1 <> 0& Then
                                    If ReadFile(hFile2, Buffer2(0&), BufferSize, BytesRead2) <> FALSE_ And BytesRead2 <> 0& Then
                                        CompareMemoryFailed = RtlCompareMemory(Buffer1(0&), Buffer2(0&), BytesRead1) <> BytesRead2
                                        If CompareMemoryFailed Then Exit Do 'RtlCompareMemory aborts comparison as soon as a mismatch occurs
                                    Else
                                        Exit Do
                                    End If
                                Else
                                    Exit Do
                                End If
                            Loop
    
                            If Not CompareMemoryFailed Then
                                If SetFilePointerEx(hFile1, 0@, FilePointer, FILE_CURRENT) Then         'Get current file pointer position for File1
                                    If FilePointer = FileSize1 Then                                     'See if the entire file has been read
                                        If SetFilePointerEx(hFile2, 0@, FilePointer, FILE_CURRENT) Then 'Get current file pointer position for File2
                                            IsSameFile = FilePointer = FileSize2                        'IsSameFile returns True only if RtlCompareMemory
                                        End If                                                          'has successfully compared all bytes from both files
                                    End If
                                End If
                            End If
                        End If
                    End If
                End If
    
                hFile2 = CloseHandle(hFile2):   Debug.Assert hFile2
            End If
    
            hFile1 = CloseHandle(hFile1):       Debug.Assert hFile1
        End If
    End Function
    Quote Originally Posted by addyanto View Post
    I'm thinking that for future to do a hash of every file, or a crc32/64, to help with maintain the archive easier.
    You might want to check out CDCheck. In addition to hashing files, it can also compare two directories and tell you which pair of files are dissimilar and/or missing from the other folder.

  8. #8

    Thread Starter
    New Member
    Join Date
    Jul 2019
    Posts
    6

    Re: ReadFileDirectlyToString

    Ahaaaaa, RtlCompareMemory. Thanks a lot! I have a BIG sensation that this is what I need.
    I will look into it very soon.

  9. #9

    Thread Starter
    New Member
    Join Date
    Jul 2019
    Posts
    6

    Re: ReadFileDirectlyToString

    Looks like is (almost) exactly what I had to do
    Thanks a lot.


    For the moment my program is doing the job as it is (I clean it up a little today and compiled several hours ago). After some testing (files mismatched by myself at a byte inside somewhere), so far it did checked and moved >3000 files (26GB) but a lot more (~7 TB) have to be done

  10. #10
    Fanatic Member
    Join Date
    Feb 2003
    Posts
    721

    Re: ReadFileDirectlyToString

    Not sure if this is useful, but: did you know you can use GetLastError to check for error codes immediately after an API function call and FormatMessage to retrieve the message associated with that code?

  11. #11

    Thread Starter
    New Member
    Join Date
    Jul 2019
    Posts
    6

    Re: ReadFileDirectlyToString

    yes I know, thank you..

    Update for the program I did: Everything was done and started to be used the next day after last message. Until now almost 5TB has been checked and transferred, remains several tens of files scattered from them. So far so good.

    Code for the comparison is something like :
    Code:
                    'read files contents into bytearrays
                    lSuccess1 = ReadFileS(FileNumber1, b1(0), lBytesToRead, lBytesRead1, 0&)
                    lSuccess2 = ReadFileS(FileNumber2, b2(0), lBytesToRead, lBytesRead2, 0&)
                    DoEvents
                    If lSuccess1 = 0 Or lSuccess2 = 0 Then GoTo CloseHandles
                    If lBytesRead1 > 0 And lBytesRead2 > 0 Then
                            CompareMemoryFailed = RtlCompareMemory(b1(0), b2(0), lBytesRead1) <> lBytesRead2
                            If CompareMemoryFailed Then GoTo CloseHandles 'RtlCompareMemory aborts comparison as soon as a mismatch occurs
                            If lBytesRead1 <> lBytesRead2 Then GoTo CloseHandles
                    End If
                    
                    PosX = PosX + DimBIGBuffer
                    If PosX + DimBIGBuffer >= FileSize1 Then Exit Do
    and the whole compare routine (maybe someone in the future will use it) is this:
    Code:
    
    
    'md_Compare
    
    Public Function BinaryCompareWHF(wsFileName1$, wsFileName2$, BuffSize As Long, lblStatus As Label, lblTotLen As Label) As Long
    'UNICODE + HUGE FILES + FAST
    'Return True if files ar identical
    
    'Setup buffer sizes
    Dim DimBIGBuffer&
    DimBIGBuffer = BuffSize
    
    
    'Open files
    Dim hFile1 As Long, FileNumber1 As Long, FileSize1 As Currency, ret1 As Long
    Dim hFile2 As Long, FileNumber2 As Long, FileSize2 As Currency, ret2 As Long
    On Error Resume Next
            hFile1 = CreateFileW(StrPtr(wsFileName1), GENERIC_READ, FILE_SHARE_READ Or FILE_SHARE_WRITE, 0&, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0)
            If Err.Number > 0 Then
                Err.Clear
                FileNumber1 = -1
            Else
                FileNumber1 = hFile1
                ret1 = SetFilePointer(hFile1, 0, 0, FILE_BEGIN)
                API_GetFileSize hFile1, FileSize1
            End If
            
            hFile2 = CreateFileW(StrPtr(wsFileName2), GENERIC_READ, FILE_SHARE_READ Or FILE_SHARE_WRITE, 0&, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, 0)
            If Err.Number > 0 Then
                Err.Clear
                FileNumber2 = -1
            Else
                FileNumber2 = hFile2
                ret2 = SetFilePointer(hFile2, 0, 0, FILE_BEGIN)
                API_GetFileSize hFile2, FileSize2
            End If
    On Error GoTo 0
    'end of opening files
    
    If hFile1 = INVALID_HANDLE_VALUE Then GoTo ExitBecauseOfInequality
    If hFile2 = INVALID_HANDLE_VALUE Then GoTo ExitBecauseOfInequality
    
    'compare sizes
    If FileSize1 <> FileSize2 Then GoTo ExitBecauseOfInequality
    
    'read data begin
    Dim BuffLen&, lBytesToRead&, lBytesRead1&, lBytesRead2&, lSuccess1&, lSuccess2&
    Dim b1() As Byte, b2() As Byte
    Dim s1$, s2$, k&
    
    DoEvents
    
    If FileSize1 >= DimBIGBuffer Then
            
            BuffLen = DimBIGBuffer
            ReDim b1(BuffLen - 1)
            ReDim b2(BuffLen - 1)
            lBytesToRead = BuffLen
            
            Dim PosX As Currency, CompareMemoryFailed As Boolean
            
            
            Do
                    'read files contents into bytearrays
                    lSuccess1 = ReadFileS(FileNumber1, b1(0), lBytesToRead, lBytesRead1, 0&)
                    lSuccess2 = ReadFileS(FileNumber2, b2(0), lBytesToRead, lBytesRead2, 0&)
                    DoEvents
                    If lSuccess1 = 0 Or lSuccess2 = 0 Then GoTo CloseHandles
                    If lBytesRead1 > 0 And lBytesRead2 > 0 Then
                            CompareMemoryFailed = RtlCompareMemory(b1(0), b2(0), lBytesRead1) <> lBytesRead2
                            If CompareMemoryFailed Then GoTo CloseHandles 'RtlCompareMemory aborts comparison as soon as a mismatch occurs
                            If lBytesRead1 <> lBytesRead2 Then GoTo CloseHandles
                    End If
                    
                    PosX = PosX + DimBIGBuffer
                    frmProcessCompare.lblFilename.Caption = FileName(wsFileName1) & " - " & Format$(PosX / FileSize1 * 100, "0.0") & "%"
                    DoEvents
                    If PosX + DimBIGBuffer >= FileSize1 Then Exit Do
                                
                    If StopCompareFunction Then GoTo CloseHandles
            Loop
            
    End If
    
    BuffLen = FileSize1 - PosX
    If BuffLen > 0 Then
            ReDim b1(BuffLen - 1)
            ReDim b2(BuffLen - 1)
            lBytesToRead = BuffLen
            'read files contents into bytearrays
            lSuccess1 = ReadFileS(FileNumber1, b1(0), lBytesToRead, lBytesRead1, 0&)
            lSuccess2 = ReadFileS(FileNumber2, b2(0), lBytesToRead, lBytesRead2, 0&)
            If lSuccess1 = 0 Or lSuccess2 = 0 Then GoTo CloseHandles
            If lBytesRead1 > 0 And lBytesRead2 > 0 Then
                    CompareMemoryFailed = RtlCompareMemory(b1(0), b2(0), lBytesRead1) <> lBytesRead2
                    If CompareMemoryFailed Then GoTo CloseHandles 'RtlCompareMemory aborts comparison as soon as a mismatch occurs
                    If lBytesRead1 <> lBytesRead2 Then GoTo CloseHandles
            End If
            PosX = PosX + BuffLen
            frmProcessCompare.lblFilename.Caption = FileName(wsFileName1) & " - " & "100% match"
            UpdateLog FileName(wsFileName1) & " - " & "100% match"
            
            Dim totLen As Currency
            On Error Resume Next
            totLen = CCur(lblTotLen.Tag)
            On Error GoTo 0
            totLen = totLen + PosX
            lblTotLen.Tag = totLen
            lblTotLen.Caption = CvFileLenToMBytesStr(CDbl(totLen)) & " moved"
            lblTotLen.ToolTipText = totLen & " bytes moved"
    '        UpdateLog lblTotLen.Caption
            DoEvents
    End If
    
    
    Equality:
    BinaryCompareWHF = True
    
    
    CloseHandles:
    Dim ret&
    ret = CloseHandle(FileNumber1): If ret = 0 Then MsgBox "Err while closing filename " & wsFileName1
    ret = CloseHandle(FileNumber2): If ret = 0 Then MsgBox "Err while closing filename " & wsFileName2
    
    ExitBecauseOfInequality:
    End Function

    API_GetFileSize was taken from either:
    http://www.vbforums.com/showthread.p...File-I-O-Class
    https://www.codeguru.com/cpp/w-d/doc...File-Limit.htm
    Last edited by addyanto; Jul 29th, 2019 at 06:10 AM. Reason: formatting + 2 references

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width