Results 1 to 19 of 19

Thread: Whats the fastest way to search zip for string

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Aug 2019
    Posts
    194

    Whats the fastest way to search zip for string

    i tried 7zip command line its fast but looking for alternative always don't hurt to try.
    vb6 zip modules they are ok only tried one.
    am looking for something very fast that can scan through 1k or 2k zip in one same folder and return true or false.

    search for xxx.bin or some-name.dat or 000.bin or 00210.bin or sound1.wav.
    i will truely and really be greatfull if all the well expereinced programmers can share examples so that i can look into and learn from.

    am looking to load say 1k or more list of zip files into listview then add paths to other working zip directories and then scan all zip for missing files and repair it by updating the bad missing zip files by getting files from working archive then make each line green when its fixed..

    as a test create 10 random zip files and put in 1 or 2 files in any zip files randomly then scan and se if it files the files with the right zip you put files into.
    Last edited by doberman2002; Sep 23rd, 2019 at 10:35 PM.

  2. #2
    Sinecure devotee
    Join Date
    Aug 2013
    Location
    Southern Tier NY
    Posts
    6,582

    Re: Whats the fastest way to search zip for string

    Since you have to unzip the file in order to check for the strings, I can't imagine a faster tool than something built for that purpose.

    The first thought that that came to mind when reading the requirements is to somehow avoid having to unzip the files to begin with by keeping what files are in what zip files in a database so you can query the database instead. But since I don't know the hierarchy of how the contents of the zips relate to other zips, and how they are created, perhaps there is a reason this wouldn't be practical.
    It would seem you would also want CRCs of the zip files in the database to quickly check the integrity of the related zip files.

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Aug 2019
    Posts
    194

    Re: Whats the fastest way to search zip for string

    Quote Originally Posted by passel View Post
    Since you have to unzip the file in order to check for the strings, I can't imagine a faster tool than something built for that purpose.

    The first thought that that came to mind when reading the requirements is to somehow avoid having to unzip the files to begin with by keeping what files are in what zip files in a database so you can query the database instead. But since I don't know the hierarchy of how the contents of the zips relate to other zips, and how they are created, perhaps there is a reason this wouldn't be practical.
    It would seem you would also want CRCs of the zip files in the database to quickly check the integrity of the related zip files.

    put 2 zip in app.path and load the zip names into list9.

    on list9 click add this
    Code:
    Set sh = CreateObject("shell.application")
    Set N = sh.NameSpace(Text11.text & List9.text)
    For Each I In N.Items
    List1.AddItem I
    Next
    no need to unzip make mess.

    with 7zip you can use cmd to search 2k zip files fast but i need other ways.
    these methods only checks contents in zip and does not unzip or damage it
    Attached Files Attached Files

  4. #4
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Whats the fastest way to search zip for string

    search zip for string
    is the string to be searched the name of the file in the zip or a string within a zipped file?
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  5. #5
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Whats the fastest way to search zip for string

    Strange obsession with "fastest" all the time. Makes one wonder what the end goal is.

    I assumed it was just looking for file names. Not much to that, as long as the Zip archive and any embedded Zip archives aren't passworded:

    Code:
    Private Shell As Object
    
    Private Function SearchFolder( _
        ByVal Folder As Object, _
        ByRef Target As String, _
        ByVal Compare As VbCompareMethod) As Boolean
    
        Dim Index As Variant
        Dim FolderItem As Object
    
        With Folder.Items
            For Index = 0 To .Count - 1
                Set FolderItem = .Item(Index)
                With FolderItem
                    If .IsFolder Then
                        If SearchFolder(FolderItem.GetFolder, Target, Compare) Then
                            SearchFolder = True
                            Exit Function
                        End If
                    ElseIf Not .IsLink Then
                        If StrComp(.Name, Target, Compare) = 0 Then
                            SearchFolder = True
                            Exit Function
                        End If
                    End If
                End With
            Next
        End With
    End Function
    
    Private Function Search( _
        ByVal Zip As Variant, _
        ByRef Target As String, _
        Optional ByVal Compare As VbCompareMethod = vbTextCompare) As Boolean
    
        With Shell
            Search = SearchFolder(.NameSpace(Zip), Target, Compare)
        End With
    End Function
    
    Private Sub ShellInit()
        Set Shell = CreateObject("Shell.Application")
    End Sub
    Pass the path to the Zip archive and the target file name to the Search() function.

    Fastest? No, but simple code with no 3rd party software required.

    If you want to search for embedded Zip archives themselves as well as normal files then a minor tweak would be required. This will also browse within any Cab archives since those are also "folders" to Shell32.

  6. #6
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Whats the fastest way to search zip for string

    IMO this thread (and all the ones before it) outright "cry for a DB",
    to handle all this "List-Filtering and Searching" via SQL.

    Below an example, how one could approach this more "systematically" using SQLite
    (where the DBEngine itself has builtin ZipArchive-support).



    Code for the above: http://vbRichClient.com/Downloads/ZipSearchSQLite.zip

    HTH

    Olaf

  7. #7
    Sinecure devotee
    Join Date
    Aug 2013
    Location
    Southern Tier NY
    Posts
    6,582

    Re: Whats the fastest way to search zip for string

    My mistake. For some reason I was thinking that you had files that had links in them to other files, and the files with the links had been compressed. You are just searching the list of files that are contained in the zip, i.e. the table of contents of the zip.

  8. #8

    Thread Starter
    Addicted Member
    Join Date
    Aug 2019
    Posts
    194

    Re: Whats the fastest way to search zip for string

    Quote Originally Posted by westconn1 View Post
    is the string to be searched the name of the file in the zip or a string within a zipped file?
    westconn1 string within the zip file.

    Schmidt compile error vbrichclien5
    Last edited by doberman2002; Sep 25th, 2019 at 04:19 AM.

  9. #9
    Hyperactive Member
    Join Date
    Aug 2017
    Posts
    380

    Re: Whats the fastest way to search zip for string

    FWIW, here's a pure VB6 routine that will determine whether a given ZIP file contains the specified filename or not. It is not as sophisticated nor as well-tested as the other ZIP libraries above, but hopefully, you might learn something from it.

    Code:
    Option Explicit     '    In a standard (.BAS) module
    Option Compare Text '<-- Comment this out if case-insensitive comparisons for the Like operator are unwanted
    
    Private Type CDFH
        Signature         As Long       'Central directory file header signature = 0x02014B50
        VersionMadeBy     As Integer    'Version made by
        MinVersion        As Integer    'Version needed to extract (minimum)
        GPFlag            As Integer    'General purpose bit flag
        CompMethod        As Integer    'Compression method
        LastModTime       As Integer    'File last modification time
        LastModDate       As Integer    'File last modification date
        CRC32             As Long       'CRC-32
        CompSize          As Long       'Compressed size
        UncompSize        As Long       'Uncompressed size
        FileNameLen       As Integer    'File name length
        ExtraFieldLen     As Integer    'Extra field length
        FileCommentLen    As Integer    'File comment length
        DiskStart         As Integer    'Disk number where file starts
        IntFileAttribs    As Integer    'Internal file attributes
        ExtFileAttribs    As Long       'External file attributes
        LocalHeaderOffset As Long       'Relative offset of local file header
        Filename          As String * 2 'File name (variable length)
       'ExtraField        As String * 2 'Extra field (variable length)
       'FileComment       As String * 2 'File comment (variable length)
    End Type
    
    Private Type EOCD
        Signature      As Long       'End of central directory signature = 0x06054B50
        DiskNumber     As Integer    'Number of this disk
        CDStartDisk    As Integer    'Disk where central directory starts
        DiskCDRecords  As Integer    'Number of central directory records on this disk
        TotalCDRecords As Integer    'Total number of central directory records
        CDSize         As Long       'Size of central directory (in bytes)
        CDStartOffset  As Long       'Offset of start of central directory, relative to start of archive
        CommentLength  As Integer    'Comment length
        Comment        As String * 2 'Comment (variable length)
    End Type
    
    'IsFileInZip:   Searches a zip file for the specified filename.
    'ZipFileName:   Absolute or relative path to the zip file.
    'FindFile:      Filename to search for. Can be any pattern that the Like operator accepts.
    'FoundFile:     If supplied, receives the first found filename, if any.
    'Return Value:  True if at least 1 instance of FindFile was successfully found, False otherwise.
    'Notes:         Zip format features like UTF-8 filename encoding, encryption (password), ZIP64
    '               (files > 4GB), multi-part archives & other proprietary extensions are NOT supported!
    '               Does not attempt to recursively search embedded zip files.
    'Reference:     https://en.wikipedia.org/wiki/Zip_(file_format)
    
    Public Function IsFileInZip(ByRef ZipFileName As String, ByRef FindFile As String, Optional ByRef FoundFile As String) As Boolean
        Const CDFH_SIGNATURE = &H2014B50, EOCD_SIGNATURE = &H6054B50, EFS = &H800               'EFS - Language encoding flag
        Dim FN As Integer, Pos As Long, Pos2 As Long
        Dim sFileName As String, CDFH As CDFH, EOCD As EOCD
    
        FN = FreeFile
        On Error GoTo 1
        Open ZipFileName For Binary Access Read Lock Write As FN
            Pos = LOF(FN) + 1&                                                                  'Must add 1 because the Get statement is 1-based
    
            For Pos = Pos - 22& To Pos - &H10015 Step -1&                                       'Scan the file backwards starting from (LOF - EOCD min size) to (LOF - EOCD max size)
                Get FN, Pos, EOCD.Signature
                If EOCD.Signature = EOCD_SIGNATURE Then Exit For
            Next
    
            If EOCD.Signature = EOCD_SIGNATURE Then
                Get FN, Pos, EOCD
                Pos = EOCD.CDStartOffset + 1&                                                   'Must add 1 because the Get statement is 1-based
    
                Do While EOCD.DiskCDRecords                                                     'Examine all central directory records on this disk
                    Get FN, Pos, CDFH
    
                    If CDFH.Signature = CDFH_SIGNATURE Then
                        If (CDFH.VersionMadeBy And &HFF00) = 0 Or _
                           (CDFH.VersionMadeBy And &HFF00) = &HA00 Then                         'If host system is either MS-DOS or Windows NTFS
                            If (CDFH.ExtFileAttribs And vbDirectory) <> vbDirectory Then        'If not a folder
                                If (CDFH.GPFlag And EFS) = 0 Then                               'If filename (and comment) are in the original ZIP character encoding (IBM Code Page 437)
                                    sFileName = Space$(CDFH.FileNameLen)
                                    Get FN, Pos + 46&, sFileName
    
                                    Pos2 = InStrRev(sFileName, "/")                             'See if the filename contains a path
                                    If Pos2 Then
                                        sFileName = Right$(sFileName, Len(sFileName) - Pos2)    'If it does, strip the path from the filename
                                    End If
    
                                    If sFileName Like FindFile Then                             'The Option Compare Text option will determine if the comparison is case-sensitive or not
                                        FoundFile = sFileName
                                        IsFileInZip = True
                                        Exit Do
                                    End If
                                End If
                            End If
                        End If
    
                        Pos = Pos + 46& + CDFH.FileNameLen + CDFH.ExtraFieldLen + CDFH.FileCommentLen
                        EOCD.DiskCDRecords = EOCD.DiskCDRecords - 1
                    Else
                        Exit Do                                                                 'Abort searching the zip file if there is at least 1 invalid central directory signature
                    End If
                Loop
            End If
    1   Close FN
        On Error GoTo 0
    End Function
    
    Private Sub Main()
        Dim sZipFile As String, sFound As String
    
        sZipFile = Dir("*.zip")
    
        Do While LenB(sZipFile)
            If IsFileInZip(sZipFile, "*.*", sFound) Then
                Debug.Print "Found """; sFound; """ in """; sZipFile; """"
            End If
    
            sZipFile = Dir
        Loop
    End Sub

  10. #10
    PowerPoster
    Join Date
    Feb 2006
    Posts
    24,482

    Re: Whats the fastest way to search zip for string

    You can't search a database until you have one. It makes sense to run a "harvester" once to create one if the Zip archives are stable over time, but either way you need a harvester first.

  11. #11
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Whats the fastest way to search zip for string

    Quote Originally Posted by doberman2002 View Post
    Schmidt compile error vbrichclien5
    You need to download and install the COM-SQLite-Wrapper (vbRichClient5) first,
    before you try to run the Demo-Project.

    The BaseDlls for that are in the Download-section on vbRichClient.com


    Olaf

  12. #12

    Thread Starter
    Addicted Member
    Join Date
    Aug 2019
    Posts
    194

    Re: Whats the fastest way to search zip for string

    Quote Originally Posted by Schmidt View Post
    You need to download and install the COM-SQLite-Wrapper (vbRichClient5) first,
    before you try to run the Demo-Project.

    The BaseDlls for that are in the Download-section on vbRichClient.com


    Olaf

  13. #13
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Whats the fastest way to search zip for string

    So, did you "download and install" the needed Dlls?

    You've already stated, that you're "willing to learn" - but do you really -
    (when you don't bother to even read the replies)?

    Olaf

  14. #14

    Thread Starter
    Addicted Member
    Join Date
    Aug 2019
    Posts
    194

    Re: Whats the fastest way to search zip for string

    Quote Originally Posted by Schmidt View Post
    So, did you "download and install" the needed Dlls?

    You've already stated, that you're "willing to learn" - but do you really -
    (when you don't bother to even read the replies)?

    Olaf
    its asking for the ZipFiles.db its missing?
    maybe if you can provided installation files to sql-ight it will be better.
    Last edited by doberman2002; Sep 25th, 2019 at 05:27 PM.

  15. #15
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Whats the fastest way to search zip for string

    No, the "missing" DBFile will be automatically created in the next line.
    The error-break you got, is due to a non-existing (or "not-registered") Helper-Reference (vbRichClient5).

    So, here again my question:
    did you "download and install" the needed Dlls?
    (from the Download-Page at vbRichClient.com)

    Olaf

  16. #16

    Thread Starter
    Addicted Member
    Join Date
    Aug 2019
    Posts
    194

    Re: Whats the fastest way to search zip for string

    Quote Originally Posted by Schmidt View Post
    No, the "missing" DBFile will be automatically created in the next line.
    The error-break you got, is due to a non-existing (or "not-registered") Helper-Reference (vbRichClient5).

    So, here again my question:
    did you "download and install" the needed Dlls?
    (from the Download-Page at vbRichClient.com)

    Olaf
    i downloaded vbRichClient5.dll ONLY and placed it in the directory of the project you provided and now when i start the project it throws this error message.

  17. #17

    Thread Starter
    Addicted Member
    Join Date
    Aug 2019
    Posts
    194

    Re: Whats the fastest way to search zip for string

    @Schmidt i registered the vbRichClient5.dll and its opening now,let me test it
    adding more then 30 zip app fails to load throws error message.
    bare in mind i maybe have 1k or 3k zips
    Last edited by doberman2002; Sep 25th, 2019 at 05:42 PM.

  18. #18
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Whats the fastest way to search zip for string

    Just the vbRichClient5.dll (in a certain Folder on your machine) is not enough.

    Just google for the package: vbRC5BaseDlls.zip

    It contains the "full triple" of Dlls:
    - vbRichClient5.dll
    - vb_cairo_sqlite.dll
    - DirectCOM.dll

    All of them need to be unzipped (together) into a Folder on your dev-machine.
    This Folder does *not* need to be you Project-Folder (I usually place them in C:\RC5\).

    And in exactly that folder, you will then have to register vbRichClient5.dll only.
    And only once...

    HTH

    Olaf

  19. #19
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Whats the fastest way to search zip for string

    The error message "cannot find end of central directory record" hints to a malformed zip
    (if you google around for that error-string).

    To make the code ignore such Zips (reporting them),
    you can wrap the appropriate block with an error-handler like that:
    Code:
          'provide the second table (ZipContents) with the Parent_ID of the just inserted Record from Cmd1 -
          'and also unzip all raw-contents of the Zip-DataBlob B into this ZipContents-table
          '(which thus ensures an 1:n relationship with the Parent-Table ZipFiles)
          On Error Resume Next
            Cmd2.SetInt32 1, Cnn.LastInsertAutoID
            Cmd2.SetText 2, Left$(DL.FileName(i), Len(DL.FileName(i)) - 4)
            Cmd2.SetBlob 3, B
            Cmd2.Execute
            If Err Then Debug.Print "malformed zip: "; DL.Path & DL.FileName(i)
          On Error GoTo 0
    HTH

    Olaf

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width