Results 1 to 15 of 15

Thread: Needs Optimization - too slow!

  1. #1

    Thread Starter
    Addicted Member finn0013's Avatar
    Join Date
    Jan 2001
    Charleston, SC

    Exclamation Needs Optimization - too slow!

    I have the following code that I need optimized. I have gotten the rest of the application running pretty fast, however I am not sure how to get this faster.

    A little insight - in the code below the variable blData is a string that holds, on average, 5mb. It is simply a text file that has been read to a string. I am searching for a string within the file. If found then I am removing from the start of the string to the next vbCrLf.

    I am open to all suggestions - if there is any way to make this run faster I would like to hear it.

    VB Code:
    1. Friend Sub CheckFile(FilePath As String, Optional isFolder As Boolean = False)
    3.     Dim xPos As Long
    4.     Dim xPos2 As Long
    5.     Dim oEntry As String
    6.     Dim sEntry() As String
    7.     Dim f As mFileApi.FileInfo
    8.     Dim tp As String
    9.     Dim fStat As String
    10.     Dim fType As String
    11.     Dim fSize As String
    12.     Dim fCreated As String
    13.     Dim fMod As String
    14.     Dim fAccessed As String
    15.     Dim fClr As Long
    17. On Error GoTo ERROR_OCCURRED
    19.     If wasCancelled Then Exit Sub
    21.     fStat = ""
    23.     If isFolder Then
    24.         'Folder - don't bother w/properties
    25.         fType = "Folder"
    26.         fSize = ""
    27.         fCreated = ""
    28.         fMod = ""
    29.         fAccessed = ""
    30.     Else
    31.         'File - get properties
    32.         f = mFileApi.GetFileInfo(FilePath)
    33.         fType = "File"
    34.         fSize = f.fSize
    35.         fCreated = f.fCreated
    36.         fMod = f.fModified
    37.         fAccessed = f.fAccessed
    38.     End If
    40.     'Check to see if the FilePath was found in orig file
    41.     xPos = InStr(1, blData, FilePath, vbTextCompare)
    43.     If xPos <> 0 Then
    44.         'Item Found - compare props
    45.         'xPos
    46.         xPos2 = InStr(xPos, blData, vbCrLf, vbBinaryCompare)
    48.         If xPos2 = 0 Then
    49.             'End char not found - grab to end of file
    50.             oEntry = Right(blData, Len(blData) - xPos)
    51.             blData = Left(blData, xPos)
    52.         Else
    53.             'Remove the string from orig file
    54.             oEntry = Mid(blData, xPos, xPos2 - xPos)
    55.             blData = Left(blData, xPos - 1) & Right(blData, Len(blData) - xPos2 - 1)
    56.         End If
    59.         sEntry = Split(oEntry, ",")
    61.         'Check to see if the file has changed
    62.         If Trim(fSize) <> Trim(sEntry(3)) Or _
    63.             Trim(fCreated) <> Trim(sEntry(4)) Or _
    64.             Trim(fMod) <> Trim(sEntry(5)) Then
    66.             fStat = "Changed"
    67.             fClr = CLR_CHANGED
    68.         Else
    69.             'File has not changed - add it to the unchanged list
    70.             If Len(blUnchanged) > 0 Then
    71.                 blUnchanged = blUnchanged & vbCrLf
    72.             End If
    74.             If Right(oEntry, Len(vbCrLf)) = vbCrLf Then
    75.                 oEntry = Left(oEntry, Len(oEntry) - Len(vbCrLf))
    76.             End If
    78.             blUnchanged = blUnchanged & oEntry
    79.         End If
    80.     Else
    81.         'Not found in orig file - new
    82.         fStat = "New"
    83.         fClr = CLR_NEW
    84.     End If
    87.     If fStat <> "" Then
    88.         InsertRow FilePath, fStat, fType, fSize, fCreated, fMod, fAccessed, fClr
    89.     End If
    91.     Exit Sub
    95.     If Err.Number = 70 Then Resume Next
    97.     LogError "CheckFile2"
    98. End Sub

  2. #2
    Super Moderator si_the_geek's Avatar
    Join Date
    Jul 2002
    Bristol, UK
    one simple thing that should speed things up a bit - use $ versions of string functions, eg:

    VB Code:
    1. oEntry = Mid$(blData, xPos, xPos2 - xPos)
    2.             blData = Left$(blData, xPos - 1) & Right$(blData, Len(blData) - xPos2 - 1)
    the versions without $ don't actually return strings, just variants which contain strings. this means that the values need to be converted from strings to variants, and back to strings again (as your variables are strings).

    I'm afraid I haven't got time to check all your code, I'd recommend stepping through it to see where the biggest delays are - then work on the areas that are slowest (or tell us, we can help more if we know the exact problems).

  3. #3

    Thread Starter
    Addicted Member finn0013's Avatar
    Join Date
    Jan 2001
    Charleston, SC
    The slowest part seems to be the call to InStr - it takes a while to search through all 5mb of data. This process will be running against 50,000+ files so every bit of optimization I can get helps.

    Is there a way to perform this search faster? Maybe an API call to search a file contents as opposed to reading it into a string first? Would this be faster even if there is an API call for it?

  4. #4
    Super Moderator si_the_geek's Avatar
    Join Date
    Jul 2002
    Bristol, UK
    I don't know any API for it (you could try the API link below). If there isn't one for files, there may be for memory (to InStr the string faster).

    I notice "blData" being used in lots of places, but not defined. Am I right in assuming that it's a global/module level variable? Is it unaltered between calls of this function?

    if so, you could create a copy of it split by vbcrlf's (then check each element in the array), maybe that would improve the speed a bit?

    and another thought... is there any particular reason that this is in a text file rather than a database?

  5. #5

    Thread Starter
    Addicted Member finn0013's Avatar
    Join Date
    Jan 2001
    Charleston, SC
    Yes, it is a global variable. The only change in between each method call is that if the search parameter is found the string is removed.

    The only reason that it is not currently in a database is that I wanted the functionality of being able to save each result as a separate file for later comparison. With enough design I could emulate this in DB if I have no other choice.

    I will try the same method with both an array and with memory API and see if that speeds things up a bit. (I had not even thought of using memory api).

  6. #6

  7. #7
    Join Date
    Oct 2003
    InStr will always be slow.. there are faster search methods than sequential character comparison and im sure that instr uses them, you could write another one, but it will cook your brains getting your head round it.

    to be honest, youre going to have to either:

    buy a faster machine

    write the program in another language

    create indexes for every file, much like a search engine for the web would.. then you can easily find it in your files, but if this is a one shot replacement, dont bother.. the time you spend trying to optimise it, would be better spent getting it running on those files, because by the time youve found a way to halve the time it takes, you'll be 3 times past the deadline

  8. #8
    Former Admin/Moderator MartinLiss's Avatar
    Join Date
    Sep 1999
    San Jose, CA

    Part 1

    Optimizing Code

    Unless you're doing tasks like generating fractals, your applications are unlikely to be limited by the actual processing speed of your code. Typically other factors — such as video speed, network delays, or disk activities — are the limiting factor in your applications. For example, when a form is slow to load, the cause might be the number of controls and graphics on the form rather than slow code in the Form_Load event. However, you may find points in your program where the speed of your code is the gating factor, especially for procedures that are called frequently. When that's the case, there are several techniques you can use to increase the real speed of your applications:

    Avoid using Variant variables.

    Use Long integer variables and integer math.

    Cache frequently used properties in variables.

    Use module-level variables instead of Static variables

    Replace procedure calls with inline code.

    Use constants whenever possible.

    Pass arguments with ByVal instead of ByRef.

    Use typed optional arguments.

    Take advantage of collections.
    Even if you’re not optimizing your code for speed, it helps to be aware of these techniques and their underlying principles. If you get in the habit of choosing more efficient algorithms as you code, the incremental gains can add up to a noticeable overall improvement in speed.

    Avoid Using Variant Variables
    The default data type in Visual Basic is Variant. This is handy for beginning programmers and for applications where processing speed is not an issue. If you are trying to optimize the real speed of your application, however, you should avoid Variant variables. Because Visual Basic converts Variants to the appropriate data type at run time, operations involving other simple data types eliminate this extra step and are faster than their Variant equivalents.

    A good way to avoid Variants is to use the Option Explicit statement, which forces you to declare all your variables. To use Option Explicit, check the Require Variable Declaration check box on the Editor tab of the Options dialog box, available from the Tools menu.

    Be careful when declaring multiple variables: If you don’t use the As type clause, they will actually be declared as Variants. For example, in the following declaration, X and Y are variants:

    Dim X, Y, Z As Long

    Rewritten, all three variables are Longs:

    Dim X As Long, Y As Long, Z As Long

    For More Information To learn more about Visual Basic data types, see "Data Types" in "Programming Fundamentals."

    Use Long Integer Variables and Integer Math
    For arithmetic operations avoid Currency, Single, and Double variables. Use Long integer variables whenever you can, particularly in loops. The Long integer is the 32-bit CPU's native data type, so operations on them are very fast; if you can’t use the Long variable, Integer or Byte data types are the next best choice. In many cases, you can use Long integers when a floating-point value might otherwise be required. For example, if you always set the ScaleMode property of all your forms and picture controls to either twips or pixels, you can use Long integers for all the size and position values for controls and graphics methods.

    When performing division, use the integer division operator (\) if you don’t need a decimal result. Integer math is always faster than floating-point math because it doesn’t require the offloading of the operation to a math coprocessor. If you do need to do math with decimal values, the Double data type is faster than the Currency data type.

    The following table ranks the numeric data types by calculation speed.

    Numeric data types Speed
    Long Fastest
    Currency Slowest

    Cache Frequently Used Properties in Variables
    You can get and set the value of variables faster than those of properties. If you are getting the value of a property frequently (such as in a loop), your code runs faster if you assign the property to a variable outside the loop and then use the variable instead of the property. Variables are generally 10 to 20 times faster than properties of the same type.

    Never get the value of any given property more than once in a procedure unless you know the value has changed. Instead, assign the value of the property to a variable and use the variable in all subsequent code. For example, code like this is very slow:

    For i = 0 To 10
    picIcon(i).Left = picPallete.Left
    Next I

    Rewritten, this code is much faster:

    picLeft = picPallete.Left
    For i = 0 To 10
    picIcon(i).Left = picLeft
    Next I

    Likewise, code like this . . .

    Do Until EOF(F)
    Line Input #F, nextLine
    Text1.Text = Text1.Text + nextLine

    . . . is much slower than this:

    Do Until EOF(F)
    Line Input #F, nextLine
    bufferVar = bufferVar & nextLine & vbCrLf
    Text1.Text = bufferVar

    However, this code does the equivalent job and is even faster:

    Text1.Text = Input(F, LOF(F))

    As you can see, there are several methods for accomplishing the same task; the best algorithm is also the best optimization.

    This same technique can be applied to return values from functions. Caching function return values avoids frequent calls to the run-time dynamic-link library (DLL), Msvbvm60.dll.

    Use Module-level Variables Instead of Static Variables
    While variables declared as Static are useful for storing a value over multiple executions of a procedure, they are slower than local variables. By storing the same value in a module-level variable your procedure will execute faster. Note, however, that you will need to make sure that only one procedure is allowed to change the module-level variable. The tradeoff here is that your code will be less readable and harder to maintain.

    Replace Procedure Calls with Inline Code
    Although using procedures makes your code more modular, performing each procedure call always involves some additional work and time. If you have a loop that calls a procedure many times, you can eliminate this overhead by removing the procedure call and placing the body of the procedure directly within the loop. If you place the same code inline in several loops, however, the duplicate code increases the size of your application. It also increases the chances that you may not remember to update each section of duplicate code when you make changes.

    Likewise, calling a procedure that resides in the same module is faster than calling the same module in a separate .BAS module; if the same procedure needs to be called from multiple modules this gain will be negated.

    Use Constants Whenever Possible
    Using constants makes your application run faster. Constants also make your code more readable and easier to maintain. If there are strings or numbers in your code that don’t change, declare them as constants. Constants are resolved once when your program is compiled, with the appropriate value written into the code. With variables, however, each time the application runs and finds a variable, it needs to get the current value of the variable.

    Whenever possible, use the intrinsic constants listed in the Object Browser rather than creating your own. You don’t need to worry about including modules that contain unused constants in your application; when you make an .exe file, unused constants are removed.

    Pass Unmodified Arguments with ByVal Instead of ByRef
    When writing Sub or Function procedures that include unmodified arguments, it is faster to pass the arguments by value (ByVal) than to pass them by reference (ByRef). Arguments in Visual Basic are ByRef by default, but relatively few procedures actually modify the values of their arguments. If you don’t need to modify the arguments within the procedure, define them as ByVal, as in the following example:

    Private Sub DoSomething(ByVal strName As String, _
    ByVal intAge As Integer)

    Use Typed Optional Arguments
    Typed optional arguments can improve the speed of your Sub or Function calls. In prior versions of Visual Basic, optional arguments had to be Variants. If your procedure had ByVal arguments, as in the following example, the 16 bytes of the Variant would be placed on the stack.

    Private Sub DoSomething(ByVal strName As String, _
    Optional ByVal vntAge As Variant, _
    Optional ByVal vntWeight As Variant)

    Your function uses less stack space per call, and less data is moved in memory, if you use typed optional arguments:

    Private Sub DoSomething(ByVal strName As String, _
    Optional ByVal intAge As Integer, _
    Optional ByVal intWeight As Integer)

    The typed optional arguments are faster to access than Variants, and as a bonus, you'll get a compile-time error message if you supply information of the wrong data type.

  9. #9
    Former Admin/Moderator MartinLiss's Avatar
    Join Date
    Sep 1999
    San Jose, CA

    Part 2

    Take Advantage of Collections
    The ability to define and use collections of objects is a powerful feature of Visual Basic. While collections can be very useful, for the best performance you need to use them correctly:

    Use For Each...Next rather than For...Next.

    Avoid using Before and After arguments when adding objects to a collection.

    Use keyed collections rather than arrays for groups of objects of the same type.
    Collections allow you to iterate through them using an integer For...Next loop. However, the For Each...Next construct is more readable and in many cases faster. The For Each...Next iteration is implemented by the creator of the collection, so the actual speed will vary from one collection object to the next. However, For Each...Next will rarely be slower than For...Next because the simplest implementation is a linear For...Next style iteration. In some cases the implementor may use a more sophisticated implementation than linear iteration, so For Each...Next can be much faster.

    It is quicker to add objects to a collection if you don't use the Before and After arguments. Those arguments require Visual Basic to find another object in the collection before it can add the new object.

    When you have a group of objects of the same type, you can usually choose to manage them in a collection or an array (if they are of differing types, a collection is your only choice). From a speed standpoint, which approach you should choose depends on how you plan to access the objects. If you can associate a unique key with each object, then a collection is the fastest choice. Using a key to retrieve an object from a collection is faster than traversing an array sequentially. However, if you do not have keys and therefore will always have to traverse the objects, an array is the better choice. Arrays are faster to traverse sequentially than collections.

    For small numbers of objects, arrays use less memory and can often be searched more quickly. The actual number where collections become more efficient than arrays is around 100 objects; however, this can vary depending on processor speed and available memory.

    For More Information See "Using Collections as an Alternative to Arrays" in "More About Programming."

  10. #10
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    One part of that is wrong. Integer math is only faster using addition and subtraction. Floating point math is faster for multiplications and division, so long as you are using a Pentium class processor.

    However, that won't help much.

    In this case, it's the choice of algorithm that will make all the difference. You look for a string within an entire 5mb file. That may be the best you can do, if the string could occur anywhere within the file. However, random placement is unlikely. I would suggest that you try this:

    Instead of opening the whole file, read it line by line in a loop, checking each line. This way, you will only be searching smaller chunks at each instance, and can terminate the loop early once the target has been found.

    Of course, if the string is not found, you will still be searching the entire file, and it will probably be slower overall if this happens. Second, this may not actually be faster, depending on how InStr() works. If InStr() takes a constant time to return for files of the same length, regardless of where the target is found within the file, then the suggestion I made could save significant time. However, if InStr() returns faster for a target located near the beginning of the file rather than the middle or end, then the suggestion I made would probably slow the program down in all cases. Without testing, you can't be sure.

  11. #11
    Former Admin/Moderator MartinLiss's Avatar
    Join Date
    Sep 1999
    San Jose, CA
    Originally posted by Shaggy Hiker
    One part of that is wrong. Integer math is only faster using addition and subtraction. Floating point math is faster for multiplications and division, so long as you are using a Pentium class processor.....
    You're right. I checked using the following code. The Long division loop takes about 210 ms (on my PC) while the Double loop takes about 150 ms.

    VB Code:
    1. Option Explicit
    2. Private Declare Function GetTickCount Lib "kernel32" () As Long
    4. Private Sub Command1_Click()
    6.     Dim lngStart As Long
    7.     Dim lngFinish As Long
    8.     Dim lngCounterOne As Long
    9.     Dim lngCounterTwo As Long
    10.     Dim lngLong1 As Long
    11.     Dim lngLong2 As Long
    12.     Dim lngResult As Long
    13.     Dim dblFloat1 As Double
    14.     Dim dblFloat2 As Double
    15.     Dim dblResult As Double
    17.     lngLong1 = 9
    18.     lngLong2 = 3
    19.     dblFloat1 = 9
    20.     dblFloat2 = 3
    22.     lngStart = GetTickCount()
    24. '    For lngCounterOne = 1 To 3000000
    25. '        lngResult = lngLong1 / lngLong2
    26. '    Next lngCounterOne
    27.     For lngCounterOne = 1 To 3000000
    28.         dblResult = dblFloat1 / dblFloat2
    29.     Next lngCounterOne
    31.     lngFinish = GetTickCount()
    33.     Debug.Print CStr(lngFinish - lngStart)
    36. End Sub

  12. #12
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    I did the same test about a year ago when I first read that assertion. Kind of a nice advance. All books on high-speed graphics for pre-Pentium processors talked about fixed-point math, which required assemby routines. Now that floating point is faster, the calculations have changed.

  13. #13
    So Unbanned DiGiTaIErRoR's Avatar
    Join Date
    Apr 1999

    I've had Integers out-perform Longs, and have had ByRef out-perform ByVal substantially.

    A lot of it is going to depend on memory allocation, which is SLOW in VB. So sometimes ByRef will be quite a bit faster.

    Probably the best guideline to determine your types should be
    1) How much memory you'll allocate at most.
    2) What types you're comparing. A long and a long will process faster than a long and an integer, however, it seems in personal benchmarks an integer and integer perform the best(speed/memory usage).

    When deciding to pass byRef or byVal, you'll have to take into consideration that each ByVal will have to create another instance of that variable. Which means more memory allocation, which will probably be slower, especially for strings, and variants.


    The double takes 50 ms for me(190 ms using \), while the long takes 190 ms. (compiled)

    However using the proper \ for the Longs, not / (this is integer division, afterall). It takes 100 ms.

    Using integer types, it comes down to 60-70 ms.

    But the Byte type takes the cake at 40 ms.

    Whoops... I was putting them all back into longs! But it doesn't seem to make any difference with the lngResult as a byte or integer type.
    Last edited by DiGiTaIErRoR; Oct 24th, 2003 at 02:06 PM.

  14. #14
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Are you saying that integer is measureably different from long? I had never heard that before, and can't figure why it is true. I could pull something out of my ass about why it would go the other way, but not with long being slower than integer.

  15. #15
    Only Slightly Obsessive jemidiah's Avatar
    Join Date
    Apr 2002
    A binary comparison (as opposed to string) might also be faster.
    The time you enjoy wasting is not wasted time.
    Bertrand Russell

    <- Remember to rate posts you find helpful.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts

Click Here to Expand Forum to Full Width