Directory Tree demonstrates how to list all subdirectories under a directory. Simply specify the "root" directory and output file.
This can be useful, for example, when writing a program that searches for files.
Printable View
Directory Tree demonstrates how to list all subdirectories under a directory. Simply specify the "root" directory and output file.
This can be useful, for example, when writing a program that searches for files.
Significantly faster method, is to request directories only, not all files.
Hence the line...
should read.Code:Item = Dir$("*.*", vbArchive Or vbDirectory Or vbHidden Or vbSystem)
Difference in performance, is very significant - typically minutes, when there is large amount of directories and files within these.Code:Item = Dir$("*.", vbArchive Or vbDirectory Or vbHidden Or vbSystem)
Note that the underlying APIs (FindFirstFile, etc.) will still enumerate files matching the specified pattern whether they are wanted or not. ;)
Thanks for the replies,
Tech99: thanks for the suggestion. Why did you omit the last wild card (*. instead of *.*)? Directory names do sometimes have an extension.
Bonnie West:
Does this mean it doesn't really matter?
Any way,
Writing code, especially like this, is always balancing between readabilty and simplicity. Directly using the API probably would be fastest.
Too bad if the situation is that kind. Slows down substantially.
We have instructed not to use '.' in directory names, in our file handling applications. Otherwise performance in file systems containing over 10k folders, would be very sluggish - as you are enumerating files also, usually tens of thousands to at least few hundred thousand.
Since my program is meant as a generic demonstration, I tried not to exclude anything for the sake of speed. If people feel they could use my code, they're perfectly free to make any changes they like, for whatever reason. So I'm probably going to leave the code "as is" for now. But I also understand your point about having to make certain sacrifices (such as omitting directory's with extensions in their name) in order to keep things fast.
Thanks for the comments.
I fully understand that, and believe that others do also. However my point in this subject is to clarify subject where 'wrong coding' easily leads to 'unusable application'*. Been there done that.
*Poor performance when files and folders count is largish.
For the better performance one might use FindFirstFileEx API using FindExInfoBasic and FIND_FIRST_EX_LARGE_FETCH flags.
Even better perfomance is achieved using NtQueryDirectoryFile API.
aaand... best performance is achieved by using either of the previous methods and ditching NT as a file server altogether and using fex. RHE Linux and Samba shares = two to three fold better in file system performance.**
** or in NT case, code a system driver which reads and parses MFT, then entire disk folders- and files names can be read in within few seconds - regardless of folder and file count.
According to the documentation, FindFirstFile "searches a directory for a file or subdirectory with a name that matches a specific name (or partial name if wildcards are used)."
What that means is that the search pattern cannot guarantee that only files or only folders will be found. Files & folders can both have dot (.) characters in their names and it is also perfectly possible for both of them to not have any dot characters in their names.
Here's an example. Consider a directory that contains these files & folders:
New Folder
New.Folder
New Text Document txt
New Text Document.txt
The *.* search pattern would match all 4 of those files & folders. The *. search pattern, OTOH, would match New Folder and New Text Document txt.
So, as you can see, it isn't possible to instruct FindFirstFile to return files only or folders only. Performance differs of course because files usually have extension names and excluding them from the search will indeed result in a much faster directory listing.
Yes i know, i should have been more specific in description.
Now when you request folders using *.* the api function returns all files and folders. Very large amount of items in worst case, then when you request folder names using *. only the file count drops substantially and performance is much better - of course, after Findfile - one should filter out files using directory flag comparison.
We typically have customer systems thousands of folders, where file count easily exceeds 20k, that makes significant difference and is easily handled limiting/prohibiting using dot in folder names.
Code:'part of...
Sub RecurseFolder (ByRef pstrFolder As String)
Dim lngHandle As Long
Dim strFile As String
Dim typFind As WIN32_FIND_DATA
If Right$(pstrFolder, 1) = "\" Then pstrFolder = Left$(pstrFolder, Len(pstrFolder) - 1)
lngHandle = FindFirstFile(pstrFolder & "\*.", typFind)
Do While lngHandle <> INVALID_HANDLE_VALUE
strFile = Left$(typFind.cFileName, InStr(typFind.cFileName, vbNullChar) - 1)
If (typFind.dwFileAttributes And vbDirectory) = vbDirectory Then
If Left$(strFile, 1) <> "." Then
RecurseFolder pstrFolder & "\" & strFile
Here is very fast enumerator (EnumFolders.zip)
http://www.vbforums.com/showthread.p...=1#post4936619
On the contrary. Poor performance, when only folder names needs to enumerate... but little change makes it perform. Altought this version could not use '*.' notation, hence slower than method above.
However i am planning to write enumeration class using ntQueryFileDirectory, which should be bit faster, expecially in network shares.
Code:Public Sub EnumFolders(ByVal sPath As String, _
Optional ByVal sPattern As String = "*.*", _
Optional ByVal lAttributeFilter As FileAttributes = Attr_ALL, _
Optional ByVal bRecurse As Boolean = True)
Dim lHandle As Long
Dim sName As String
Dim Lines As Long
Dim lPtr As Long
lPtr = VarPtr(wFD)
On Error GoTo ProcedureError
sPath = QualifyPath(sPath)
lHandle = FindFirstFileW(StrPtr("\\?\" & sPath & sPattern), lPtr)
If lHandle > 0 Then
Do
With wFD
sName = TrimNull(.cFileName)
If (.dwFileAttributes And vbDirectory) Then
If bRecurse Then
If AscW(sName) <> vbDot Then 'skip . and .. entries
1. added RaiseEvent ItemDetails(sPath, sName) 'Added line
EnumFolders sPath & sName, sPattern, lAttributeFilter, bRecurse
End If
End If
2. comm 'ElseIf (.dwFileAttributes And lAttributeFilter) Then 'Commented out
3. ented out 'RaiseEvent ItemDetails(sPath, sName) 'Commented out
End If
End With
Loop While FindNextFileW(lHandle, lPtr) > 0
End If
FindClose lHandle
Exit Sub
ProcedureError:
Debug.Print "Error " & Err.Number & " " & Err.Description & " of EnumFolders"
End Sub
Do you mean Enumfolders Modifed sample?
http://www.vbforums.com/attachment.p...5&d=1447745715
That is slow, because it does not enumerate folders only, but also files. Tested it against my code to one file share.
116 folders share. Enumfolders modified took around 16 seconds and true folders only enumeration took sub one second.
17418 folders, with over half million files share. Enumfolders modified took around 13 minutes and true folders only enumeration took about 2.3 second.
This version (List folders.zip post #13 in this thread) performance is acceptable, 17418 folders, took about 7.3 second.
Difference to my code, comes mainly from listing folders, when in my version folders are only read to array.
So the *.* vs *. do really matter. It would be interesting to see how NtQueryDirectoryFile would perform.
Here is Bonney West last submitted version (List folders.zip post #13 in this thread) a bit optimized -> half second better performance per 10k path count.
Modifiation to the mnuBrowse_Click() subroutine (add three lines).
Replace whole ListFolders subroutine, with this code.Code:ListFolders sPath, "*." '<-- The "*." pattern fails to list folder names such as "New.Folder"
'ListFolders sPath, "*.*" '<--The "*.*" pattern is bit slower, but would find folder names such as New.Folder"
'Add these next three lines to the mnuBrowse_Click() subroutine
For i = LBound(dirNames) To UBound(dirNames)
SendMessage m_hWndLB, LB_ADDSTRING, 0&, StrPtr(dirNames(i))
Next i
Add to module or form level.Code:Private Sub ListFolders(ByRef FolderPath As String, Optional ByRef Pattern As String = "*")
'NOTE!!! FolderPath should not end in a trailing backslash (\)
Const ALLOC_CHUNK = 10&
Dim hFindFile As Long
Dim i As Long
Dim Length As Long
Dim SubFolder As String
Static lCount As Long
hFindFile = FindFirstFile(FolderPath & "\" & Pattern, m_WFD)
If hFindFile <> INVALID_HANDLE_VALUE Then
Do 'Process folders only (junctions, symlinks & mounted folders won't be recursed)
If (m_WFD.dwFileAttributes And (FILE_ATTRIBUTE_DIRECTORY Or FILE_ATTRIBUTE_REPARSE_POINT)) And Asc(m_WFD.cFileName) <> vbDot Then
Length = lstrlen(m_WFD.cFileName)
SubFolder = Left$(m_WFD.cFileName, Length)
lCount = lCount + 1
ReDim Preserve dirNames(lCount) As String
dirNames(lCount) = FolderPath & ("\" & SubFolder & "\")
ListFolders FolderPath & ("\" & SubFolder), Pattern 'Recurse subfolders
End If
Loop While FindNextFile(hFindFile, m_WFD)
hFindFile = FindClose(hFindFile): Debug.Assert hFindFile
End If
End Sub
Code:Private Const vbDot = 46
Dim dirNames() As String
No, I was referring to the attachment in my post.
Yes, the search pattern greatly influences indeed the amount of file and folder names that FindFirstFile will return. However, note that the *. pattern does not filter out names such as ".Folder" or "..File". ;)
In the original code:
the highlighted part served to ensure that only regular folders were processed. Junctions, symbolic links and mounted folders all have the FILE_ATTRIBUTE_REPARSE_POINT flag set, so in order to skip them, the expression must include the highlighted portion above.Code:If (m_WFD.dwFileAttributes And (FILE_ATTRIBUTE_DIRECTORY Or FILE_ATTRIBUTE_REPARSE_POINT)) = FILE_ATTRIBUTE_DIRECTORY Then
Regarding the testing of the "." and ".." entries, code that utilizes the Asc(W) function fails to take into account file & folder names that begins with the dot character. Such names (which are more like extension-only names) are legal and are actually not really that rare (at least in my system).
Yes, so i thought also, just asking for.
Sure we know that - however one must bear in mind that most files do have an extension and folders do not - at least when prohibited to use dot when naming folders, so that - in well doumented and instructed to userland - does significant difference.Quote:
Yes, the search pattern greatly influences indeed the amount of file and folder names that FindFirstFile will return. However, note that the *. pattern does not filter out names such as ".Folder" or "..File". ;)
...and when your search pattern is nailed to '*.' user does not find incorrectly named folders (if using explorer etc. 'standard' tooling to create/rename), so that '*.' steers them very effectively to follow rules. :)
Yes, that also is by purpose/design.Quote:
Regarding the testing of the "." and ".." entries, code that utilizes the Asc(W) function fails to take into account file & folder names that begins with the dot character. Such names (which are more like extension-only names) are legal and are actually not really that rare (at least in my system).
That expression is checking for both FILE_ATTRIBUTE_DIRECTORY and FILE_ATTRIBUTE_REPARSE_POINT bit flags. (If we neglect to check for FILE_ATTRIBUTE_REPARSE_POINT, we could possibly recurse into a junction, symbolic link or mounted folder and we normally don't want to do that [theoretically, infinite recursion could happen].) In order for the test to succeed, only the FILE_ATTRIBUTE_DIRECTORY bit must be set (if FILE_ATTRIBUTE_REPARSE_POINT is also set, then the entire expression evaluates to False). That is the purpose of the equality test at the end.
BTW, most of the folder names in my system that either contains or begins with the dot character weren't created by me. Programs such as GIMP, Java, Pale Moon, Notepad++ and even Windows itself (see e.g. the winsxs folder) all created folders that contained dot characters without my intervention. IMO, if one is going to write a generic directory walker code that uses FindFirstFile & co., details like dot characters in folder names and the possibility of encountering junctions, symbolic links and mounted folders must be taken into account.
Ok, i see - excellent point, so i stand corrected.
Directory evaluation corrected:
As what comes to hard/soft/symbolic links or junctions - it seems that there is none, other than those Windows OS generated ones (All Users, Application Data, Documents etc.).Code:'dot is not allowed in directory names
If (m_WFD.dwFileAttributes And (FILE_ATTRIBUTE_DIRECTORY Or FILE_ATTRIBUTE_REPARSE_POINT)) = FILE_ATTRIBUTE_DIRECTORY And (Asc(m_WFD.cFileName) <> vbDot) Then
Users typically are not creating/defining those, neither do engineering or other apps we use. Briefly checked and did not find dot containing folder names from Notepad++ or Gimp installations, Java nor Pale Moon browser we don't even run.
Here is my unicode class. In my example fill a RTB with all folders from C.
Class do some nice things and is extracted from M2000 Interpreter (the I use it to fill a user control, a special list box that can be show folders and files in a form of tree).
In a form only these lines are enough to get all folders along the path. Because list can hold files too, we have to use mid$() to skip the folder marker. So if you make NoFiles as False you get the files too.
Code:Dim md As New recDir
Private Sub Form_Load()
Dim a$, i As Long, k As Long
RichTextBox1 = ""
md.Nofiles = True
a$ = md.Dir2("C:\", , True)
Me.Caption = md.listcount
k = 1
For i = 1 To md.listcount
RichTextBox1.SelStart = k
a$ = Trim$(Str$(i)) + " " + Mid$(md.List(i), 2) + vbCrLf
k = k + Len(a$)
RichTextBox1.SelText = a$
Next i
End Sub
Attachment 132389
Replace the form code from previous example, and you can use with Events the recDir class. You can stop the searching by using the close button on window.
Code:Dim WithEvents md As recDir, working As Boolean, getout As Boolean
Dim k As Long, i As Long
Private Sub Form_Load()
Set md = New recDir
Dim a$, i As Long, k As Long
k = 1
Show
working = True
RichTextBox1 = ""
md.Nofiles = True
laststr = "C:\"
a$ = md.Dir2("C:\", , True)
End Sub
Private Sub Form_Resize()
If Me.ScaleHeight > 1000 And Me.ScaleWidth > 1000 Then
RichTextBox1.Move 0, 0, Me.ScaleWidth, Me.ScaleHeight
End If
End Sub
Private Sub Form_Terminate()
Set md = Nothing
End Sub
Private Sub Form_Unload(Cancel As Integer)
md.abort = True
If working Then Cancel = True: getout = True
End Sub
Private Sub md_DirFinished()
working = False
If getout Then
Unload Me
Else
Me.Caption = md.listcount
MsgBox "finished"
End If
End Sub
Private Sub md_feedback(FileName As String)
RichTextBox1.SelStart = k
i = i + 1
'a$ = Trim$(Str$(i)) + Space$(md.ReadLevel(md.listcount - 1) + 1) + FileName + vbCrLf
a$ = Trim$(Str$(i)) + md.FindFolder(md.listcount - 1) + Mid$(FileName, 2) + vbCrLf
k = k + Len(a$)
RichTextBox1.SelText = a$
End Sub
Yeah, junctions and symbolic links are generally underused in Windows. Mounted folders, however, might be slightly more common.
- GIMP created ".gimp-2.8" and ".thumbnails" folders under my user profile folder.
- Java created "jre1.8.0_51" and "jre1.8.0_60" folders under its installation directory.
- Pale Moon (and most likely other FireFox based browsers) created 2 "*.default" folders under the 2 "Profiles" folder in the "AppData" directory.
- Notepad++ created a "user.manual" folder under its installation directory.
- Finally, and I don't know how this got into my system, Microsoft (?) created a "Microsoft.NET" folder in the "Program Files" directory.
The point Peter Swinkels and I are trying make is that the "*." pattern is not a totally fool-proof way of filtering out files and returning folders only. Even if your users were told to avoid using the dot character in their folder names, some programs may just be out of your control. FindFirstFile et al. are probably not the best choice for you if you require both rapid searching and folders-only enumeration.
Tech99 ,
Using this line you can add spaces but not the full path
a$ = Trim$(Str$(i)) + Space$(md.ReadLevel(md.listcount - 1) + 1) + FileName + vbCrLf
As you see for each full path need a search in the list. So the right job is to show the folders, using the level and when the user click on a folder then one time he get the search.
The think about the use of a level and only a name is for the minimum need to store information. We don't have to repeat each path in each line. So you get the minute because using md.FindFolder(md.listcount - 1) in each line add more time to final loop.
If you see the class recDir there is method to define a stop level (say 3 sub folders), and you can select the sorting method
In my code in my sign you find my M2000 Interpreter. There I have a control I made using this class (In the demo above I put a small function that leaves in a module in M2000). The control show a directory, or a tree of files and can search in the background using the code asynchronously. And if you see you can break the search at any time. But if you see the code there is a Doevents in the code that is calling using a MOD operator. So if you change the number of Doevents that occur, the rate of that, you get a faster or slower search.
Another way to use RecDir class is by using it to fill internal array once :
Code:Sub testme(p$, tp$)
Dim md As New recDir, offset As Long
offset = Len(p$) + 1
md.LevelStop = 1
md.SortType = 2 ' change that
a$ = md.Dir2(p$, tp$)
Do
a$ = md.Dir2()
If a$ <> "" Then Debug.Print a$ Else Exit Do
Loop
md.Nofiles = True
a$ = md.Dir2(p$)
Do
a$ = md.Dir2()
If a$ <> "" Then Debug.Print Mid$(a$, offset) Else Exit Do
Loop
End Sub
You didn't read my post at all?
So i clarify my point of view once more. We don't care if application creates it's own dotted folder names (for thumbnails or whatever) - those folders are not project data folders under data folder tree created by project people ie. where engineers, designers, accoutants etc. store their work files.
Like Bonnie already said you can not say you should not use "." in folder names and ignore folders which do have "." in their name.
This should be a generic code submission.
So others can use it as a template and adapt it to their needs.
I did, and I believe I understood every word you wrote pretty well.
The only reason why I joined this thread was because you stated in your post #2:
Since my post #3, I have been trying to point out to you that that method works well only in your particular case. Other programmers, especially "those junior coders", who will come to this thread in the future might get the impression that it's OK to use the "*." pattern if all they want to enumerate are folders only. Well, it is not, as my demonstrations above have shown.
You mentioned above that you were planning on writing an enumeration class using NtQueryDirectoryFile. I'm not familiar with that API, but if it has the capability to enumerate folders only (and do it quickly), then there is probably no more reason to forbid your users from using the dot character in folder names.
Not sure about that yet, but NtQuery is fastest method to filesystem driver.
For the directory only enumeration there is FindFirstFileEx API, found out that in yesterday, when studying this matter. To modify your sample to use EX version.
Difference in performance is about one second per 1.2K folders, when pattern is '*.' and bit more than that when pattern is '*.*'.Code:'Add to declaration section
Private Enum FINDEX_INFO_LEVELS
FindExInfoStandard = 0&
FindExInfoBasic = 1& 'supported in W7 and newer
FindExInfoMaxInfoLevel = 2&
End Enum
Private Enum FINDEX_SEARCH_OPS
FindExSearchNameMatch = 0&
FindExSearchLimitToDirectories = 1&
FindExSearchLimitToDevices = 2&
FindExSearchMaxSearchOp = 3&
End Enum
Private Const FIND_FIRST_EX_LARGE_FETCH = 2
Private Declare Function FindFirstFileEx Lib "kernel32.dll" Alias "FindFirstFileExA" (ByVal lpFileName As String, _
ByVal FindExInfoLevel As FINDEX_INFO_LEVELS, lpFindFileData As WIN32_FIND_DATA, ByVal FindExSearchOp As FINDEX_SEARCH_OPS, lpSearchFilter As Any, ByVal dwAdditionalFlags As Long) As Long
'Change FindFirstFile call to FindFirstFileEx call in Private Sub ListFolders.
'hFindFile = FindFirstFile(FolderPath & "\" & Pattern, m_WFD)
hFindFile = FindFirstFileEx(FolderPath & "\" & Pattern, FINDEX_INFO_LEVELS.FindExInfoBasic, m_WFD, FINDEX_SEARCH_OPS.FindExSearchLimitToDirectories, 0&, 0&)
Now the '*.*' pattern performs quite well in large filesystems.
If your program is only being used in NT-based OSs, then you can improve the performance some more by just simply replacing all ANSI APIs with their Unicode counterparts. Of course, that means you'll either need to use StrPtr() when passing Strings or you'll have to set a reference to a type library where the Unicode APIs are declared.
Quote:
Originally Posted by MSDN
Bonnie West
I provide a unicode solution above, with sorting routine. You can alter to match your needs..Works with events also. Just tell me if you see it.
One scenario with FindFirstFileEx API which needs to measure, came to my mind. Enabling FIND_FIRST_EX_LARGE_FETCH flag and trying with search pattern '*'. Performance could be even better, if that kind of call would automatically recurse subfolders.
hFindFile = FindFirstFileEx(FolderPath & "\" & Pattern, FINDEX_INFO_LEVELS.FindExInfoBasic, m_WFD, FINDEX_SEARCH_OPS.FindExSearchLimitToDirectories, 0&, FIND_FIRST_EX_LARGE_FETCH)
Edit, tested performance, bit mixed results. Windows 7 workstation benefited enabling FIND_FIRST_EX_LARGE_FETCH flag and setting pattern to '*'. Server search did not, actually it was bit slower than querying without large fetch flag.
W7 workstation, 1285 folders
pattern *.* -> 6.8731, 6.9714 and 6.8767 seconds.
Pattern * and Large_Fetch -> 5.3779, 5.4598, 5.5258 seconds.
W12K R2 server, 19459 folders
Pattern *.* -> 3.1201, 3.1146, 3.1199 seconds.
Pattern * and Large_Fetch -> 3.2430, 3.2397, 3.2519 seconds.
W12K R2 server, via share 17434 folders
Pattern *.* -> 26.1318, 26.5813, 26.5810 seconds.
Pattern * and Large_Fetch -> 24.6481, 24.5381, 24.5071 seconds.
Interesting that server machine performs lesser when FIND_FIRST_EX_LARGE_FETCH flag is enabled.
btw... adding 19459 folders to listbox took 9.5 seconds (mean).
Quite a suprise is that the SMB perfomance is so much lesser (cpu Xeon E5 2620 v3 with 12 cores, 40 Gb memory, 4 x gigabit network, Smart Array P440ar controller), tuning tips are welcome. :)
Increase AdditionalWorkerThreads or what to try/do?
https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx
https://redmondmag.com/articles/2014...-problems.aspx
https://technet.microsoft.com/en-us/.../jj134210.aspx
Thanks, georgekar! Yeah, I've already seen your attachment in post #22. I hope you don't mind, but I favor a different approach when it comes to optimizing directory enumeration. I prefer to do things as directly as possible so that intermediate "steps" such as Events are skipped.
More faster (more than 3 times faster)
Attachment 132457
I do an optimization. Before I had a general IsDir() function, But now because I have "data" for attribute I know when I have folder. Secondly I put in raiseevent the foldername and all folders.
Check the code above.
Tech99
There is a huge directory in windows 7, C:\Windows\winsxs with 6808 folders and 6800 of them with many dots in name. if you exclude that folders then you gain time, but you have miss a lot....
And what you mean noncached performance...Does Windows cache all directories of C:\
No, i didn't test against C:\ folder, but the same data folder structure used in other previous tests, so there were no dots in directory names. Windows somewhat caches ie. when you read folder structure second time, time taken drops - so to get true measurement this must be dealt - either flush cahce or disable caching altogether when testing.
https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx