Results 1 to 12 of 12

Thread: [RESOLVED] Faster way to get the last N newest files in a directory???

  1. #1

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Resolved [RESOLVED] Faster way to get the last N newest files in a directory???

    Hello VBFers,

    I have a directory with tens of thousands files in them. I need to get only N newest files from it based on file LastWriteTime.
    Currently, I use DirectoryInfo.GetFiles("*.*") and then sort the returned FileInfo array by LastWriteTime descending. Once done sorting, I grab the 1st N elements off the fileinfo array...
    While this approach works, I have a strong feeling that it's not optimal. The sorting of the array is a performance killer. For a folder with 13,000+ files, the function call take almost 30 seconds to return.

    Can you suggest a better approach?

    vb Code:
    1. Private Function GetNewestFiles(ByVal dirPath As String, ByVal N As Integer) As List(Of FileInfo)
    2.         Dim fileList As New List(Of FileInfo)
    3.         Dim dirInfo As New DirectoryInfo(dirPath)
    4.         Dim allFiles() As FileInfo = dirInfo.GetFiles("*.*")
    5.         Array.Sort(allFiles, Function(x, y) y.LastWriteTime.CompareTo(x.LastWriteTime))
    6.         If allFiles.Length >= N Then
    7.             Dim nFiles(N - 1) As FileInfo
    8.             Array.Copy(allFiles, nFiles, N)
    9.             fileList.AddRange(nFiles)
    10.         Else
    11.             fileList.AddRange(allFiles)
    12.         End If
    13.         Return fileList
    14.     End Function
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  2. #2
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Faster way to get the last N newest files in a directory???

    Right off the top I can suggest Directory.EnumerateFiles. The following is a quote from the documentation for the method:
    Quote Originally Posted by MSDN
    The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
    This will basically mean that the EnumerateFiles will have deferred execution. Therefore, you should be able to start processing the IEnumerable result before it has even collected the last file. Should be a pretty decent boost in performance when working with the number of files you mention.

  3. #3
    PowerPoster
    Join Date
    Sep 2006
    Location
    Egypt
    Posts
    2,579

    Re: Faster way to get the last N newest files in a directory???

    If you want to know the newest file while your program is running then FileSystemWatcher can help!
    Code:
        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            FileSystemWatcher1.Path = "G:\2012\"
    
        End Sub
    
        Private Sub FileSystemWatcher1_Created(ByVal sender As Object, ByVal e As System.IO.FileSystemEventArgs) Handles FileSystemWatcher1.Created
            Debug.Print(e.Name)
        End Sub



  4. #4
    Frenzied Member MattP's Avatar
    Join Date
    Dec 2008
    Location
    WY
    Posts
    1,227

    Re: Faster way to get the last N newest files in a directory???

    OrderByDescending seems quicker than CompareTo. I noticed almost no delay on a folder containing 8k files.

    vb.net Code:
    1. Private Function GetNewestFiles(dirPath As String, N As Integer) As List(Of IO.FileInfo)
    2.         Dim di As New IO.DirectoryInfo(dirPath)
    3.         Return (di.GetFiles("*").OrderByDescending(Function(f) f.LastWriteTime).Take(N)).ToList()
    4.     End Function
    This pattern in common to all great programmers I know: they're not experts in something as much as experts in becoming experts in something.

    The best programming advice I ever got was to spend my entire career becoming educable. And I suggest you do the same.

  5. #5
    PowerPoster dunfiddlin's Avatar
    Join Date
    Jun 2012
    Posts
    8,245

    Re: Faster way to get the last N newest files in a directory???

    Or you can get MattP to come up with another one of those fancy one-liner jobs that nobody understands.
    Last edited by dunfiddlin; Aug 30th, 2012 at 04:38 PM.

  6. #6
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Faster way to get the last N newest files in a directory???

    I would suggest not using GetFiles, it will have to populate the entire array before the function returns, the EnumerateFiles is a much better choice for large file sets - this is mentioned in the documentation by Microsoft.

  7. #7
    Frenzied Member MattP's Avatar
    Join Date
    Dec 2008
    Location
    WY
    Posts
    1,227

    Re: Faster way to get the last N newest files in a directory???

    Quote Originally Posted by ForumAccount View Post
    I would suggest not using GetFiles, it will have to populate the entire array before the function returns, the EnumerateFiles is a much better choice for large file sets - this is mentioned in the documentation by Microsoft.
    Doesn't Directory.EnumerateFiles return an IEnumerable(Of String) rather than DirectoryInfo.GetFiles which returns a FileInfo(). He's sorting based on the LastWriteTime which would require him to convert every string to a FileInfo object. If you were parsing the file name for some information I could see EnumerabeFiles being more efficient.
    This pattern in common to all great programmers I know: they're not experts in something as much as experts in becoming experts in something.

    The best programming advice I ever got was to spend my entire career becoming educable. And I suggest you do the same.

  8. #8
    Master Of Orion ForumAccount's Avatar
    Join Date
    Jan 2009
    Location
    Canada
    Posts
    2,802

    Re: Faster way to get the last N newest files in a directory???

    You can call EnumerateFiles with a DirectoryInfo. All this information is in the documentation, I know that Stanav could have figured it out.

  9. #9
    Karen Payne MVP kareninstructor's Avatar
    Join Date
    Jun 2008
    Location
    Oregon
    Posts
    6,684

    Re: Faster way to get the last N newest files in a directory???

    The following is C Sharp, perhaphs try the sample on your folders and if it is faster create a DLL or may be there is useful information otherwise.
    http://www.codeproject.com/Articles/...ory-Enumerator

  10. #10

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: Faster way to get the last N newest files in a directory???

    @ForumAccount & Matt: Thank you for suggesting me to use The EnumerateFiles method and LINQ. I believe either of these approach would give me better performance than what I had. However, I forgot to mention that I'm still targeting .Net 2.0, and thus I can't try any of your suggestions... Will definately keep them in mind for future reference though.

    @Kevin: Thank you for pointing me to the codeproject article. I ended up using their FastDirectoryEnumerator class and I'm quite happy with the result. For a folder with over 13,000 file, the DirectoryInfo.GetFiles I used before took 30+ seconds to return while the FastDirectoryEnumerator.GetFiles now takes only 2+ second to complete the same task.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  11. #11

    Thread Starter
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,289

    Re: Faster way to get the last N newest files in a directory???

    Quote Originally Posted by 4x2y View Post
    If you want to know the newest file while your program is running then FileSystemWatcher can help!
    Code:
        Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
            FileSystemWatcher1.Path = "G:\2012\"
    
        End Sub
    
        Private Sub FileSystemWatcher1_Created(ByVal sender As Object, ByVal e As System.IO.FileSystemEventArgs) Handles FileSystemWatcher1.Created
            Debug.Print(e.Name)
        End Sub
    The FileSystemWatcher won't meet my need in this case because:
    1. I need to get N newest files, not just the newest file.
    2. I need to get them on demand (i.e. at a button click). I can't have the user sitting there waiting for an FSW event to occur.

    Thanks for the suggestion anyway...
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  12. #12
    PowerPoster
    Join Date
    Mar 2002
    Location
    UK
    Posts
    4,780

    Re: [RESOLVED] Faster way to get the last N newest files in a directory???

    If it helps add context, I did the same thing here last year.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width