Results 1 to 14 of 14

Thread: Index makes only a tiny performance improvement

  1. #1

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    762

    Index makes only a tiny performance improvement

    Using vbRichClient5.dll for SQLite, I have a table like this:

    Code:
       Set Cnn1 = New_c.Connection(, DBCreateInMemory)
       Cnn1.AddUserDefinedCollation New CCollateExplorer
       
       With Cnn1.NewFieldDefs
          .Add "EXPLORER_LOCATION   Text     Collate  ExplorerSort"
          .Add "EXPLORER_ITEM_NAME  Text     Collate  ExplorerSort"
          .Add "EXPLORER_ITEM_TYPE  Integer"                                       ' This can be 0 or 1. 0 means it is a folder (subfolder). 1 means that it is a file
       End With
       Cnn1.CreateTable "EXPLORER_LIST"
    Then I populate it with file names and paths from a parent path.
    The number of records that are populated (that is the number of the files in that parent path) is 10540

    Then I read the above records sorted by the combination of EXPLORER_ITEM_TYPE and EXPLORER_ITEM_NAME.
    The reason for combination is that I want the code to give me first the folders (subfolders under the parent folder) and then the files.

    The problem (and the question) is that the use of an index makes just a tiny speed improvement (and not a large improvement as I expect).

    Here is the query without the use of index:
    Code:
       sql = "select  EXPLORER_LOCATION, EXPLORER_ITEM_NAME, EXPLORER_ITEM_TYPE  from  EXPLORER_LIST  order by  EXPLORER_ITEM_TYPE, EXPLORER_ITEM_NAME"
       Set Rs = Cnn1.OpenRecordset(sql)
    And here is the query with the use of an index:
    Code:
       sql = "create index if not exists  idx1  ON  EXPLORER_LIST ( EXPLORER_ITEM_TYPE, EXPLORER_ITEM_NAME )"
       Cnn1.Execute sql
    Code:
       sql = "select  EXPLORER_LOCATION, EXPLORER_ITEM_NAME, EXPLORER_ITEM_TYPE  from  EXPLORER_LIST  indexed by  idx1  order by  EXPLORER_ITEM_TYPE, EXPLORER_ITEM_NAME"
       Set Rs = Cnn1.OpenRecordset(sql)
    I measure the time taken by the query and also the time to create the index.
    After excluding the index creation time, I compare the time taken by the main query alone (only the select query)

    When I repeat this test and comparison many times and I compare the select time using index as opposed to select time without index, I get an improvement of 6 to 8 percent.
    At most, no better than 10 percent!
    Why is it that the index makes such a tiny improvement?
    Why doesn't it make a large improvement?

    Please advise.
    Thanks

  2. #2
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Index makes only a tiny performance improvement

    Could you show the Code you currently have in your Class CCollateExplorer?

    Olaf

  3. #3
    PowerPoster
    Join Date
    Aug 2010
    Location
    Canada
    Posts
    2,412

    Re: Index makes only a tiny performance improvement

    Does it perform any better if you include the ExplorerSort collation in the index creation? e.g.

    Code:
       sql = "create index if not exists  idx1  ON  EXPLORER_LIST ( EXPLORER_ITEM_TYPE COLLATE ExplorerSort, EXPLORER_ITEM_NAME  COLLATE ExplorerSort)"

  4. #4
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Index makes only a tiny performance improvement

    FWIW, here again my (now slightly changed) example, which I've posted in an earlier thread:

    Form-Code:
    Code:
    Option Explicit
    
    Private Cnn As cConnection
    
    Private Sub Form_Load()
      New_c.Timing True
      
      Set Cnn = New_c.Connection(, DBCreateInMemory)
          Cnn.AddUserDefinedCollation New cCollateExplorer
          Cnn.Execute "Create Table Files(ID Integer Primary Key, FileType Integer, FileName Text Collate ExplorerSort)"
     
      Dim Cmd As cCommand
      Set Cmd = Cnn.CreateCommand("Insert Into Files(FileType, FileName) Values(?,?)")
      
      Dim DL As cDirList, i As Long
      Set DL = New_c.FSO.GetDirList(New_c.FSO.GetSpecialFolder(CSIDL_SYSTEMX86))
      
      For i = 0 To DL.FilesCount - 1
        Select Case LCase$(New_c.FSO.GetFileExtension(DL.FileName(i)))
          Case "dll": Cmd.SetInt32 1, 1
          Case "exe": Cmd.SetInt32 1, 2
          Case Else:  Cmd.SetInt32 1, 0
        End Select
        Cmd.SetText 2, DL.FileName(i)
        Cmd.Execute
      Next
        
      Caption = Cnn.OpenRecordset("Select Count(*) From Files")(0) & " FileNames added after:" & New_c.Timing
      
      If MsgBox("Create combined Index on FileType,FileName?", vbYesNo) = vbYes Then
        Cnn.Execute "Create Index idxFiles_FileType_FileName On Files(FileType, FileName)"
      End If
    End Sub
     
    Private Sub Form_Click()
      New_c.Timing True
        Dim Rs As cRecordset
        Set Rs = Cnn.OpenRecordset("Select * From Files Order By FileType, FileName")
      Caption = Rs.RecordCount & " Records sorted after:" & New_c.Timing
    End Sub
    ClassCode (cCollateExplorer):
    Code:
     Option Explicit
    
    Implements ICollation
    
    Private SC As cStringCompare
    
    Private Sub Class_Initialize()
      Set SC = New_c.StringCompare
    End Sub
    
    Private Function ICollation_CallbackCollate(ByVal ZeroBasedNameIndex As Long, S1 As String, S2 As String) As Long
      Select Case ZeroBasedNameIndex
        Case 0 ' Explorer-like Sorting via cmpLogical (StrCompLogicalW)
        
          ICollation_CallbackCollate = SC.CompareString(S1, S2, cmpLogical)
    '      ICollation_CallbackCollate = New_c.StringCompare.CompareString(S1, S2, cmpLogical)
    
        Case Else: Debug.Assert False
      End Select
    End Function
    
    Private Property Get ICollation_DefinedNames() As String
      ICollation_DefinedNames = "ExplorerSort"
    End Property
    And with both modules in a new Project, I get the following timings when sorting by FileType, FileName -
    (in Form_Click, over the 2918 Files from my \SysWOW64-Folder):
    - without the combined Index: ....... 8.2msec
    - with a created combined Index: ... 1.5msec

    So, the index is indeed worthwhile, since it speeds things up by about factor 5 here
    (this would be an even higher factor with more data).

    HTH

    Olaf
    Last edited by Schmidt; Oct 5th, 2019 at 06:05 PM.

  5. #5
    Addicted Member Wolfgang Enzinger's Avatar
    Join Date
    Apr 2014
    Location
    Munich, Germany
    Posts
    160

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by IliaPreston View Post
    Why is it that the index makes such a tiny improvement?
    Why doesn't it make a large improvement?
    Well, since there is no WHERE clause in your SQL query, sorting is the only thing that the index is beneficial for.

    Actually using an index means some kind of overhead. This overhead is worth its price if there's a WHERE clause that reduces the numer of rows to be read from the DB significantly. Without that: not so much.

    Wolfgang

  6. #6
    PowerPoster wqweto's Avatar
    Join Date
    May 2011
    Location
    Sofia, Bulgaria
    Posts
    5,120

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by IliaPreston View Post
    Why doesn't it make a large improvement?
    Your final select includes EXPLORER_LOCATION column which is not part of idx1 index so it's content has to be fetch from the base table which is expensive as each row has to be first located and then fully read (all of it) in memory.

    If you omit this column in the query then idx1 will become covering index for your final select and base table should not be accessed in theory.

    Another option is to include EXPLORER_LOCATION column in idx1 index as last column (don't think sqlite allows specifying INCLUDE columns in an index like MSSQL does for covering indexes for instance)

    cheers,
    </wqw>

  7. #7

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    762

    Re: Index makes only a tiny performance improvement

    Thanks for all the great help and advice.
    In the end, it looks like it was my own measurement that was wrong and that the index (unbenownst to me) did actually make a large speed improvement (running in about one fourth of the time as compared to the query without index. That is an improvement of about 75 to 76 percent)

    My mistake was that the two processes (one without index, and the other with index) contained some other code aside from the main select query.
    Those other code were code to populate the table from the arrays in a loop in the beginning and reading from the output of the select statement to repopulate the arrays in another loop in the end.
    In the case of the test with index, there was also the index creation.

    My measurement mistakenly included all of the above except index creation. So I measured the time taken to do all of the above (instead of just the select query) once with index, and another time without the index, and compared them.
    That was wrong measurement. After correcting my measurement to measure only the select statement, I see that the select statement runs in less than 25% of the time when it uses the index.

    However, the point that jpbro raised is another interesting issue that I will address in the next post

    thanks.
    Ilia

  8. #8

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    762

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by Schmidt View Post
    Could you show the Code you currently have in your Class CCollateExplorer?

    Olaf
    In my class CCollateExplorer I use exactly the code that you provided in post #14 of the thread http://www.vbforums.com/showthread.p...-Explorer-does
    Here is my class:
    Code:
    Option Explicit
    
    Implements ICollation
    
    Private SC As cStringCompare
    
    Private Sub Class_Initialize()
      Set SC = New_c.StringCompare '<- pre-create the Comparer-instance here
    End Sub
    
    Private Function ICollation_CallbackCollate(ByVal ZeroBasedNameIndex As Long, S1 As String, S2 As String) As Long
      Select Case ZeroBasedNameIndex
        Case 0 ' Explorer-like Sorting via cmpLogical (StrCompLogicalW)
        
          ICollation_CallbackCollate = SC.CompareString(S1, S2, cmpLogical)
    '      ICollation_CallbackCollate = New_c.StringCompare.CompareString(S1, S2, cmpLogical)
    
        Case Else: Debug.Assert False
      End Select
    End Function
    
    Private Property Get ICollation_DefinedNames() As String
      ICollation_DefinedNames = "ExplorerSort"
    End Property

  9. #9

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    762

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by jpbro View Post
    Does it perform any better if you include the ExplorerSort collation in the index creation? e.g.

    Code:
       sql = "create index if not exists  idx1  ON  EXPLORER_LIST ( EXPLORER_ITEM_TYPE COLLATE ExplorerSort, EXPLORER_ITEM_NAME  COLLATE ExplorerSort)"
    There are three possible ways of doing it:

    1. Using the collation in the table creation statement for the column(s) that the collation pertains to.
    This results in the best performance and the best improvement with index:
    select query with index takes 11 milliseconds. The same select query without index takes 49 milliseconds

    2. Creating the table without collation, and then creating the index with collation:
    Code:
       sql = "create index if not exists  idx1  ON  EXPLORER_LIST ( EXPLORER_ITEM_TYPE, EXPLORER_ITEM_NAME Collate ExplorerSort )"
       Cnn1.Execute sql
    In this case the performance is relatively good whether with or without index), and the index makes a tiny improvement:
    select query with index takes 17 milliseconds. The same select query without index takes 18 milliseconds

    3. Creating the table and the index without any collation, but using the collation in the select query (in the order by clause):
    Code:
       If UseIndex Then
          sql = "select  EXPLORER_LOCATION, EXPLORER_ITEM_NAME, EXPLORER_ITEM_TYPE  from  EXPLORER_LIST  indexed by  idx1  order by  EXPLORER_ITEM_TYPE, EXPLORER_ITEM_NAME  Collate  ExplorerSort"
       Else
          sql = "select  EXPLORER_LOCATION, EXPLORER_ITEM_NAME, EXPLORER_ITEM_TYPE  from  EXPLORER_LIST  order by  EXPLORER_ITEM_TYPE, EXPLORER_ITEM_NAME  Collate  ExplorerSort"
       End If
    This has terrible performance (with or without the index):
    select query with index takes 53 milliseconds. The same select query without index takes 52 milliseconds!!! (the index actually slows down a tiny bit!!!)

    So, in my specific scenario, the best choice is to use the collation in table creation.
    However, I am curious to know what everybody else in here thinks.

    Under what circumstances would you recommend option 1 (using collation in table creation)?
    Under what circumstances would you recommend option 2 (using collation in index creation)?
    Under what circumstances would you recommend option 3 (using collation in the order by clause or where clause)?

    Thanks.
    Ilia

  10. #10
    Addicted Member Wolfgang Enzinger's Avatar
    Join Date
    Apr 2014
    Location
    Munich, Germany
    Posts
    160

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by IliaPreston View Post
    Under what circumstances would you recommend option 1 (using collation in table creation)?
    Under what circumstances would you recommend option 2 (using collation in index creation)?
    Under what circumstances would you recommend option 3 (using collation in the order by clause or where clause)?
    One should keep in mind that every TEXT column has a default collation. That is:

    * if you omit the COLLATE clause in the CREATE TABLE statement, it defaults to BINARY
    * if you omit the COLLATE clause in the CREATE INDEX statement, it defaults to the collation defined (implicit or explicit) in the CREATE TABLE statement
    * if you omit the COLLATE clause in the SELECT statement, it defaults to the collation defined (implicit or explicit) in the CREATE TABLE statement; after that is determined, the engine checks if an appropriate index ist available.
    * if you have a COLLATE clause in the SELECT statement, it is, not surprising, being used ; then, the engine checks if an appropriete index ist available.

    So, if you aren't planning on switching between collations, I'd go with option 1. If you are, however, then I'd go with option 2 and 3.

    Wolfgang

  11. #11
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by IliaPreston View Post
    There are three possible ways of doing it:
    Under what circumstances would you recommend option 1 (using collation in table creation)?
    Under what circumstances would you recommend option 2 (using collation in index creation)?
    Under what circumstances would you recommend option 3 (using collation in the order by clause or where clause)?
    I'd use 1 as often as possible, especially on Text-Columns which you later want to scan with case-insensitive comparisons
    (as e.g. [Name] or [Description] Fields, which one might want to search via Like-Operator).

    Even simple comparisons for "equalness" on Name-Fields as e.g.:
    Select * From Persons Where FirstName='ilia'
    would find the Record, with FirstName-content "Ilia", when you made already in your table-def sure,
    that the NoCase collation shall be used (for case-insensitive comparisons).
    Also an index you create, will automatically "build itself" using the Collation which was already given in the Table-Def -
    if you leave Collation-specification for Text-Fields out in the Table-Def (and also in the Index-Creation-Def),
    then the index will be created with SQLites default "Binary-Collation" (case-sensitive).

    The above answers also, why your "dynamic specification" in a given Select had no effect,
    even when an index existed ... that's because you created the index with default Binary-Collation -
    and so SQLite did not choose it to speed up your "explicitely given Sort Collation"...
    So, "only when the Collation-Types match" (in a given Index, as well as in a dynamically given Collation in a Select), will the Index be of use.

    So, option 1 (specifying the collation in the Field-Def) is generally a good idea -
    (because it eases Index-Creation, as well as Selects - which simply "inherit" the Collation from the table-field-def)
    That's especially true, when you're using "already built-in Collations" like "Binary" or "NoCase".

    If you're using "your own Userdefined-Collations" (same goes for "your own Userdefined-Functions"),
    you'll have to ask yourself, whether your DB might potentially be used "elsewhere" -
    outside the "library-context of your App" (in other Apps or on other systems).

    If there's a high-likelihood that this might be the case - then I'd leave "non-built-in" Collation-Specifications
    out of the Table-Definitions - and would also be careful to "hand the DB over" when it contains Indexes -
    which were created based on your own Userdefined-Collations, because the "foreign target-App" that plans to use your DB,
    might not have an easy way to implement the behaviour of your UserDefined-Collation in exactly the same way.

    E.g. if it's a .NET-App (which usually runs on Windows as well) you might be lucky - instructing the .NET-Devs to implement their own ExplorerSort-Collation-Callback-Class (implementing it using the very same System-API for "logical compares") -
    but if such a DB will go onto a Linux- or Android-system, then there is of course the possibility, that the Collation-Callback could
    be implemented (in C or Python or TCL, which all support those callbacks) - but the "Implementation-behaviour" would not be the "exact same thing", because the CompareStringLogicalW-API is not available on those systems.

    HTH

    Olaf
    Last edited by Schmidt; Oct 6th, 2019 at 06:25 AM.

  12. #12

    Thread Starter
    Fanatic Member
    Join Date
    Mar 2010
    Posts
    762

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by Schmidt View Post
    I'd use 1 as often as possible, especially on Text-Columns which you later want to scan with case-insensitive comparisons
    (as e.g. [Name] or [Description] Fields, which one might want to search via Like-Operator).

    Even simple comparisons for "equalness" on Name-Fields as e.g.:
    Select * From Persons Where FirstName='ilia'
    would find the Record, with FirstName-content "Ilia", when you made already in your table-def sure,
    that the NoCase collation shall be used (for case-insensitive comparisons).
    ......
    Olaf
    Thanks a lot for the explanation.
    I am a little confused about the equalness example that you provided.

    Please correct me if I am wrong: What I understand is that with a table column created in the table def statement with the default collation (binary), the characters are stored such that their storage in terms of bits and bytes are in a sequence like this:
    ...... A, B, C, D, E, ......, Z, ... other characters ... a, b, c, d, e, ......, z, ......

    Therefore when a comparison (with or without index) is done on the actual numeric values of those sets of bits and bytes, that comparison automatically coincides with the above sequence.

    And with NOCASE collation, the characters are stored such that their storage in terms of bits and bytes are in a sequence like this:
    ...... A, a, B, b, C, c, D, d, ......, Z, z, ......

    Therefore when a comparison (with or without index) is done on the actual numeric values of those sets of bits and bytes, that comparison automatically coincides with the above sequence. For example "a" falls between "A" and "B"

    Therefore with a NOCASE collation, if you test two strings to see which one is lesser and which one is greater, the comparison is done case-insensitively.

    But, EQUALNESS is another matter.
    I don't understand how the NOCASE collation can result in the test
    HTML Code:
     where "ilia" = "Ilia"
    returning True

    With the NOCASE collation as well as any other collation, the letter "i" is stored as two or three bytes (16 or 24 bits) and the letter "I" is stored as another (different) two or three bytes (16 or 24 bits) and these sets of bits and bytes for "i" and "I" are DIFFERENT.
    So, when an EQUALNESS comparison between these two sets of bits and bytes is done, the result of the comparison cannot be EQUALITY.

    Or maybe I am missing something in here.

    Please advise.
    Thanks.
    Ilia
    Last edited by IliaPreston; Oct 6th, 2019 at 07:34 PM.

  13. #13
    Addicted Member Wolfgang Enzinger's Avatar
    Join Date
    Apr 2014
    Location
    Munich, Germany
    Posts
    160

    Re: Index makes only a tiny performance improvement

    Quote Originally Posted by IliaPreston View Post
    But, EQUALNESS is another matter.
    I don't understand how the NOCASE collation can result in the test
    HTML Code:
     where "ilia" = "Ilia"
    returning True

    With the NOCASE collation as well as any other collation, the letter "i" is stored as two or three bytes (16 or 24 bits) and the letter "I" is stored as another (different) two or three bytes (16 or 24 bits) and these sets of bits and bytes for "i" and "I" are DIFFERENT.
    So, when an EQUALNESS comparison between these two sets of bits and bytes is done, the result of the comparison cannot be EQUALITY.

    Or maybe I am missing something in here.
    Yes.

    Imagine you want to do a NOCASE like equality check in VB. What would you do? Probably somthing like this:

    Code:
      If UCase$("ilia") = UCase$("Ilia") Then ...
    And that is what SQLite does, basically, whenever it notes that a NOCASE comparison ist desired.

    Wolfgang

  14. #14
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,219

    Re: Index makes only a tiny performance improvement

    Collations are (in the end) just "named Comparer-Functions" (the built-in ones very similar to those you implement yourself).

    E.g. SQLites builtin Binary-Comparer-Function is based on memcmp:
    https://www.tutorialspoint.com/c_sta...ion_memcmp.htm

    And all those Functions return a "TriState-result", depending on the two input-strings:

    • if Return value < 0 then it indicates str1 is less than str2.
    • if Return value > 0 then it indicates str2 is less than str1.
    • if Return value = 0 then it indicates str1 is equal to str2.


    We have a similar function available with VBA.StrComp(...), where the 3rd Param decides over Binary- or NoCase-comparisons.
    But note the returned 0 (which is the indicator for equalness between strings).

    Here is a simple example, which you might want to step-through via F8, to get an idea what the DB-Engine does when...
    Code:
    Option Explicit
    
    Implements ICollation
    
    Private Sub Form_Load()
      With New_c.Connection(, DBCreateInMemory)
        .AddUserDefinedCollation New Form1
        .Execute "Create Table T(TxtB Text Collate MyBinary, TxtN Text Collate MyNoCase)"
        .Execute "Insert Into T Values('ilia','ilia')"
        
        Debug.Print "Records found: "; .OpenRecordset("Select * From T Where TxtB='Ilia'").RecordCount
        Debug.Print "Records found: "; .OpenRecordset("Select * From T Where TxtN='Ilia'").RecordCount
      End With
    End Sub
     
    Private Function ICollation_CallbackCollate(ByVal ZeroBasedNameIndex As Long, S1 As String, S2 As String) As Long
      ICollation_CallbackCollate = VBA.StrComp(S1, S2, ZeroBasedNameIndex)
      Debug.Print S1, S2, ICollation_CallbackCollate
    End Function
    
    Private Property Get ICollation_DefinedNames() As String
      ICollation_DefinedNames = "MyBinary,MyNoCase"
    End Property
    HTH

    Olaf

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width