Results 1 to 10 of 10

Thread: MD5 Hash

  1. #1

    Thread Starter
    Addicted Member
    Join Date
    Aug 2006
    Posts
    176

    MD5 Hash

    I am currently using the code below to get the MD5 has of the given file. However it is very slow! Is there a faster way to do it.

    Code:
    Using md5 As MD5 = MD5.Create()
                Using stream = File.OpenRead(filename)
                    Return BitConverter.ToString(md5.ComputeHash(stream)).Replace("-", String.Empty)
                End Using
            End Using
    Thank You

  2. #2
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,299

    Re: MD5 Hash

    How big is the file? Define "very slow".

  3. #3

    Thread Starter
    Addicted Member
    Join Date
    Aug 2006
    Posts
    176

    Re: MD5 Hash

    Quote Originally Posted by jmcilhinney View Post
    How big is the file? Define "very slow".
    The files are MP4 video files over 500 megs, the duration of each file is about 25 minutes. There will eventually be larger and longer files.

    I forgot that the files are not local, they are on another computer on the same network. I am going to see if running the program from the system the files are saved on makes any difference and if so how much later today.

  4. #4

    Thread Starter
    Addicted Member
    Join Date
    Aug 2006
    Posts
    176

    Re: MD5 Hash

    I get message in the image below.
    Attached Images Attached Images  

  5. #5
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    110,299

    Re: MD5 Hash

    Quote Originally Posted by Tesla1886 View Post
    I get message in the image below.
    Don't perform long-running operations on the UI thread. Either use Async/Await or a BackgroundWorker or some other mechanism to avoid doing so.

  6. #6
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,754

    Re: MD5 Hash

    So I computed the MD5 for all of the music and video files on my system.

    00:00:32.5008616 - 32 secons
    5,502 - total files
    18,486,754,664 - total bytes
    64,663,162 - largest file
    3,360,006.3 - average file

    I read the entire file and then did the MD5.

    Code:
        Private fct As Integer = 0
        Private maxF As Long = 0
        Private totF As Long = 0
        Private md5Obj As New System.Security.Cryptography.MD5CryptoServiceProvider
    
        Private Function MD5Enc(path As String) As String
            Dim bytesToHash() As Byte = IO.File.ReadAllBytes(path)
            fct += 1 'count
            If bytesToHash.Length > maxF Then maxF = bytesToHash.Length 'largest file
            totF += bytesToHash.Length 'total for average
            Dim hash() As Byte = md5Obj.ComputeHash(bytesToHash)
            Dim rv As String = BitConverter.ToString(hash).Replace("-", String.Empty)
            bytesToHash = Nothing
            hash = Nothing
            Return rv
        End Function
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  7. #7
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,120

    Re: MD5 Hash

    Btw, can you do the same but with line Dim hash() As Byte = md5Obj.ComputeHash(bytesToHash) commented out?

    Subtract both times and now you have the time it takes to MD5 hash 18GB of data.

    OP can divide by a factor of 36 to guess how much time is spent in MD5 hashing their audio. (Bet this will be miniscule less than 0.1 seconds)

    cheers,
    </wqw>

  8. #8
    Powered By Medtronic dbasnett's Avatar
    Join Date
    Dec 2007
    Location
    Jefferson City, MO
    Posts
    9,754

    Re: MD5 Hash

    Quote Originally Posted by wqweto View Post
    Btw, can you do the same but with line Dim hash() As Byte = md5Obj.ComputeHash(bytesToHash) commented out?

    Subtract both times and now you have the time it takes to MD5 hash 18GB of data.

    OP can divide by a factor of 36 to guess how much time is spent in MD5 hashing their audio. (Bet this will be miniscule less than 0.1 seconds)

    cheers,
    </wqw>
    Without hashing,

    00:00:07.3778567
    5,502
    18,486,754,664
    64,663,162
    3,360,006.3

    FWIW - the entire code

    Code:
    Public Class Form1
        Private stpw As New Stopwatch
        Private Async Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
            Button1.Enabled = False
            Dim tsk As Task
            tsk = Task.Run(Sub()
                               stpw.Start()
                               Debug.WriteLine("---")
                               fct = 0
                               maxF = 0L
                               totF = 0L
                               Dim p As String = Environment.GetFolderPath(Environment.SpecialFolder.MyMusic)
                               DoFolders(p)
                               p = Environment.GetFolderPath(Environment.SpecialFolder.MyVideos)
                               DoFolders(p)
                               stpw.Stop()
                               Debug.WriteLine(stpw.Elapsed)
                               Debug.WriteLine(fct.ToString("n0"))
                               Debug.WriteLine(totF.ToString("n0"))
                               Debug.WriteLine(maxF.ToString("n0"))
                               Debug.WriteLine((totF / fct).ToString("n1"))
                           End Sub)
    
            Await tsk
            Button1.Enabled = True
            ' Stop
        End Sub
    
        Private Sub DoFolders(folder As String)
            Dim p() As String = IO.Directory.GetFiles(folder)
            For Each f As String In p
                Dim s As String = MD5Enc(f)
                ' Debug.WriteLine(s)
            Next
            p = IO.Directory.GetDirectories(folder)
            For Each d As String In p
                DoFolders(d)
            Next
        End Sub
    
        Private fct As Integer = 0
        Private maxF As Long = 0
        Private totF As Long = 0
        Private md5Obj As New System.Security.Cryptography.MD5CryptoServiceProvider
    
        Private Function MD5Enc(path As String) As String
            Dim bytesToHash() As Byte = IO.File.ReadAllBytes(path)
            Dim rv As String = ""
            fct += 1 'count
            If bytesToHash.Length > maxF Then maxF = bytesToHash.Length 'largest file
            totF += bytesToHash.Length 'total for average
            Dim hash() As Byte = md5Obj.ComputeHash(bytesToHash)
            rv = BitConverter.ToString(hash).Replace("-", String.Empty)
            hash = Nothing
            bytesToHash = Nothing
            Return rv
        End Function
    End Class
    Last edited by dbasnett; Jul 26th, 2022 at 12:01 PM.
    My First Computer -- Documentation Link (RT?M) -- Using the Debugger -- Prime Number Sieve
    Counting Bits -- Subnet Calculator -- UI Guidelines -- >> SerialPort Answer <<

    "Those who use Application.DoEvents have no idea what it does and those who know what it does never use it." John Wein

  9. #9
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,120

    Re: MD5 Hash

    On my machine (i9-12900K) MD5 is about 1GB/s so your results are on par with this.

    Here are results on my machine using openssl

    Code:
    c:> openssl speed md5 sha1 sha256 sha512
    
    type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    md5             199696.57k   441886.56k   769767.77k   946297.05k  1011324.39k  1015793.06k
    sha1            335285.12k   864187.22k  1775688.58k  2461186.73k  2735158.88k  2748453.52k
    sha256          328366.22k   826542.42k  1654857.43k  2223597.09k  2477478.12k  2477839.42k
    sha512           43709.36k   176412.67k   252137.11k   345084.39k   385892.27k   389505.24k
    SHA-1 is fastest (3x MD5 speed) and SHA-256 is close second with about 2GB/s on average.

    cheers,
    </wqw>

  10. #10
    Fanatic Member
    Join Date
    Jun 2019
    Posts
    557

    Re: MD5 Hash

    Standard home/small office network is 1Gbit/s or about 100MB/s.
    5400RPM HDD sequential reads are about 130-140MB/s
    7200RPM HDD sequential reads are about 220-230MB/s

    So 500MB over network require 5 seconds only for the network transfer at max speed.

    But network is used by other computers and devices, including other apps on same computer. This will "slow down" the time to "download" the file over the network.

    Hard drives are slow and become pain slow on random access. And our computers run many apps and services at the same time so they can access same hard drives and affect the read speeds.

    SSDs are much better in terms of random access speeds and much higher linear speeds. But even with PCI gen.4 SSDs and gigabytes per second reads we are limited with the network.

    Obvious upgrade is to use faster network - 5 or 10Gbit or even faster. But could become quite expensive if not planned well.
    ----
    Some benchmarks:

    57.5GB of 174 video files - each ~340MB

    Get hash using local hard drive (7200RPM DC grade with about 220-240MB/s sequential read): 4 min 15 sec
    Get hash of same files over 1Gbit/s network (different computer with similar HDD): 15 min 08 sec

    * NOTE: all "benchmarks" are performed using mobile phone stopwatch :-)

    So what is the conclusion? My network is slow (I know, I know). My hard drives are not so slow as they achieve the max sequential reads.

    What if I use SSD instead of HDD? Transferring over network for that "simple" file hash operation will kill the benefits of using SSD. For working with these files locally? Much, much better than HDDs. But with the known limitations: size and price.

    Solution to get hash of large files on remote computer? It should be obvious that this should be done locally (where data is) and then retrieved via network service that returns the hash only (few bytes) instead of gigabytes of data.

    Or get hash and save as metadata somewhere (file, local SQLite db, network SQL database) when these files are added to the storage.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width