-
Aug 30th, 2012, 02:12 PM
#1
[RESOLVED] Faster way to get the last N newest files in a directory???
Hello VBFers,
I have a directory with tens of thousands files in them. I need to get only N newest files from it based on file LastWriteTime.
Currently, I use DirectoryInfo.GetFiles("*.*") and then sort the returned FileInfo array by LastWriteTime descending. Once done sorting, I grab the 1st N elements off the fileinfo array...
While this approach works, I have a strong feeling that it's not optimal. The sorting of the array is a performance killer. For a folder with 13,000+ files, the function call take almost 30 seconds to return.
Can you suggest a better approach?
vb Code:
Private Function GetNewestFiles(ByVal dirPath As String, ByVal N As Integer) As List(Of FileInfo)
Dim fileList As New List(Of FileInfo)
Dim dirInfo As New DirectoryInfo(dirPath)
Dim allFiles() As FileInfo = dirInfo.GetFiles("*.*")
Array.Sort(allFiles, Function(x, y) y.LastWriteTime.CompareTo(x.LastWriteTime))
If allFiles.Length >= N Then
Dim nFiles(N - 1) As FileInfo
Array.Copy(allFiles, nFiles, N)
fileList.AddRange(nFiles)
Else
fileList.AddRange(allFiles)
End If
Return fileList
End Function
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
Aug 30th, 2012, 04:13 PM
#2
Re: Faster way to get the last N newest files in a directory???
Right off the top I can suggest Directory.EnumerateFiles. The following is a quote from the documentation for the method:
Originally Posted by MSDN
The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of names before the whole collection is returned; when you use GetFiles, you must wait for the whole array of names to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.
This will basically mean that the EnumerateFiles will have deferred execution. Therefore, you should be able to start processing the IEnumerable result before it has even collected the last file. Should be a pretty decent boost in performance when working with the number of files you mention.
-
Aug 30th, 2012, 04:25 PM
#3
Re: Faster way to get the last N newest files in a directory???
If you want to know the newest file while your program is running then FileSystemWatcher can help!
Code:
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
FileSystemWatcher1.Path = "G:\2012\"
End Sub
Private Sub FileSystemWatcher1_Created(ByVal sender As Object, ByVal e As System.IO.FileSystemEventArgs) Handles FileSystemWatcher1.Created
Debug.Print(e.Name)
End Sub
-
Aug 30th, 2012, 04:28 PM
#4
Re: Faster way to get the last N newest files in a directory???
OrderByDescending seems quicker than CompareTo. I noticed almost no delay on a folder containing 8k files.
vb.net Code:
Private Function GetNewestFiles(dirPath As String, N As Integer) As List(Of IO.FileInfo)
Dim di As New IO.DirectoryInfo(dirPath)
Return (di.GetFiles("*").OrderByDescending(Function(f) f.LastWriteTime).Take(N)).ToList()
End Function
This pattern in common to all great programmers I know: they're not experts in something as much as experts in becoming experts in something.
The best programming advice I ever got was to spend my entire career becoming educable. And I suggest you do the same.
-
Aug 30th, 2012, 04:33 PM
#5
Re: Faster way to get the last N newest files in a directory???
Or you can get MattP to come up with another one of those fancy one-liner jobs that nobody understands.
Last edited by dunfiddlin; Aug 30th, 2012 at 04:38 PM.
-
Aug 30th, 2012, 04:50 PM
#6
Re: Faster way to get the last N newest files in a directory???
I would suggest not using GetFiles, it will have to populate the entire array before the function returns, the EnumerateFiles is a much better choice for large file sets - this is mentioned in the documentation by Microsoft.
-
Aug 30th, 2012, 04:57 PM
#7
Re: Faster way to get the last N newest files in a directory???
Originally Posted by ForumAccount
I would suggest not using GetFiles, it will have to populate the entire array before the function returns, the EnumerateFiles is a much better choice for large file sets - this is mentioned in the documentation by Microsoft.
Doesn't Directory.EnumerateFiles return an IEnumerable(Of String) rather than DirectoryInfo.GetFiles which returns a FileInfo(). He's sorting based on the LastWriteTime which would require him to convert every string to a FileInfo object. If you were parsing the file name for some information I could see EnumerabeFiles being more efficient.
This pattern in common to all great programmers I know: they're not experts in something as much as experts in becoming experts in something.
The best programming advice I ever got was to spend my entire career becoming educable. And I suggest you do the same.
-
Aug 30th, 2012, 05:09 PM
#8
Re: Faster way to get the last N newest files in a directory???
You can call EnumerateFiles with a DirectoryInfo. All this information is in the documentation, I know that Stanav could have figured it out.
-
Aug 30th, 2012, 07:28 PM
#9
Re: Faster way to get the last N newest files in a directory???
The following is C Sharp, perhaphs try the sample on your folders and if it is faster create a DLL or may be there is useful information otherwise.
http://www.codeproject.com/Articles/...ory-Enumerator
-
Sep 4th, 2012, 08:22 AM
#10
Re: Faster way to get the last N newest files in a directory???
@ForumAccount & Matt: Thank you for suggesting me to use The EnumerateFiles method and LINQ. I believe either of these approach would give me better performance than what I had. However, I forgot to mention that I'm still targeting .Net 2.0, and thus I can't try any of your suggestions... Will definately keep them in mind for future reference though.
@Kevin: Thank you for pointing me to the codeproject article. I ended up using their FastDirectoryEnumerator class and I'm quite happy with the result. For a folder with over 13,000 file, the DirectoryInfo.GetFiles I used before took 30+ seconds to return while the FastDirectoryEnumerator.GetFiles now takes only 2+ second to complete the same task.
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
Sep 4th, 2012, 08:29 AM
#11
Re: Faster way to get the last N newest files in a directory???
Originally Posted by 4x2y
If you want to know the newest file while your program is running then FileSystemWatcher can help!
Code:
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
FileSystemWatcher1.Path = "G:\2012\"
End Sub
Private Sub FileSystemWatcher1_Created(ByVal sender As Object, ByVal e As System.IO.FileSystemEventArgs) Handles FileSystemWatcher1.Created
Debug.Print(e.Name)
End Sub
The FileSystemWatcher won't meet my need in this case because:
1. I need to get N newest files, not just the newest file.
2. I need to get them on demand (i.e. at a button click). I can't have the user sitting there waiting for an FSW event to occur.
Thanks for the suggestion anyway...
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
Sep 4th, 2012, 08:39 AM
#12
Re: [RESOLVED] Faster way to get the last N newest files in a directory???
If it helps add context, I did the same thing here last year.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|