Results 1 to 19 of 19

Thread: linQ - Need to optimize slow subquery code (.contains)

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Jul 2009
    Posts
    85

    Lightbulb linQ - Need to optimize slow subquery code (.contains)

    Hello,

    I'm using this code to get a subquery :

    Code:
    dim lnQ = (From data In tblData _
                 Where data.Genre_Id = "1" _
                 OrElse (From genre In tblStructure _
                              Where genre.parent_Id = "1" _
                         Select genre.Id).Contains(data.Genre_Id) _
                 Select data).ToList


    Basically, I have 2 datatables

    - The first is a Treeview structure
    Code:
    Category (.id="1")
      |- Genre a (.id="13")
      |- Genre b (.id="15")
      |- ...
      |- Genre n
    - The second keeps the data (20.000 rows)


    I need to pull "all rows from the category, including all rows from it's subcategories"

    My code works, but it has dramatic results (20/30 seconds to get it done), when I need an instant response. I know the .Contains is really bad.

    What would be a good fix to totally get rid of the .contains ???

    Thanks in advance!

  2. #2
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    39,043

    Re: linQ - Need to optimize slow subquery code (.contains)

    Don't use LINQ?

    LINQ is slow, in general, but the fundamental solution is to reconsider the problem. I would guess that the "1" you are using is just an example, or else there isn't any point in the LINQ, but one thing you might consider is putting the tblStructure into a Dictionary with the parent ID as the key. Finding out whether or not a Dicitonary holds a key is VERY fast compared to what you are doing. Whether or not this will work in your case is harder to say.
    My usual boring signature: Nothing

  3. #3

    Thread Starter
    Lively Member
    Join Date
    Jul 2009
    Posts
    85

    Re: linQ - Need to optimize slow subquery code (.contains)

    Thanks for your input.

    Yes the "1" is an exemple indeed. The project is of course much more complex but I just changed it make it easier for here. (But the nodes from the tree structure really holds a unique Id, so that when the user chose one node I already have the ID).



    I know that linQ is slow sometimes, but I guess it's more because of the programmer (me). I mean, most of the other queries do the job in 0.0.100, so 100 milliseconds (for the same exact database with 20.000 rows). That's pretty good.

    Just this specific query, with .Contains is a disaster.
    I'm sure there are some good workaround to optimize it.



    Thanks for the dictionnary option, but the problem is not when querying the table with the structure, but rather the main table in the database. I have no choice of doing a query on it (there are about 10 datatables with relations, primary keys etc..).

    In other words, the subquery is quite fast, but the main query (with .contains) is the problem. I tryed them both seperatly..
    Last edited by castaway; Feb 7th, 2015 at 09:09 PM.

  4. #4
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    Quote Originally Posted by castaway View Post
    Just this specific query, with .Contains is a disaster.
    I'm sure there are some good workaround to optimize it.
    Can't advise you about a better LINQ-Statement, but in case it is the
    data-record of the Parent-Node you're after, followed by the data-records
    of its first-level-descendants, then (in SQL) a simple UNION ALL would suffice...
    (there should be an equivalent for a UNION in LINQ-syntax).

    Also not sure, if you already have an index in place on your ParentID-Field(s) -
    since that usually speeds up things as well.

    Just played that through here with an SQLite-based InMemory-DB, to give you
    an impression about the timings, such things usually (should) take.

    The following is VB6-code - but from the Create Table Stament and the used SQL
    you should be able to deduce what I was doing on this simple Adjacency-List-example...
    (which is two levels deep: 200ParentNodes with 200ChildNodes each = 40200 records total) :

    Code:
    Option Explicit
    
    Private Const NperLvl& = 200 '<- adjust the Nodes per Level here
    
    Private i&, j&, MemDB As New cMemDB, NodeInsert As cCommand, Rs As cRecordset
     
    Private Sub Form_Load()
      MemDB.Exec "PRAGMA Foreign_Keys=ON" 'enable Foreign-Keys on this InMemory-DB
      
      MemDB.Exec "Create Table Nodes(   NID Text Primary Key," & _
                                   "    PID Text References Nodes On Update Cascade On Delete Cascade" & _
                                   ") Without RowID"
                                   
      'Insert example-data into the Adjacency-List we created above
      Set NodeInsert = MemDB.CreateCommand("Insert Into Nodes Values(?,?)")
      New_c.Timing True
        For i = 1 To NperLvl:    InsertNode "N_" & i 'ParentNodes
           For j = 1 To NperLvl: InsertNode "NC_" & i & "_" & j, "N_" & i 'ChildNodes
        Next j, i
      Debug.Print "Inserts on the Adj-Lst", New_c.Timing, MemDB.GetCount("Nodes")
      
      'set an additional Index on the ParentID, before we proceed
      MemDB.Exec "Create Index Idx_Nodes_PID On Nodes(PID)"
    
      New_c.Timing True 'NID-Updates will affect the ChildNodes PID too (due to Update Cascade in our Table-Def)
        MemDB.Exec "Update Nodes Set NID='N_0' Where NID='N_1'"
      Debug.Print "Node-Rename (cascaded)", New_c.Timing, MemDB.GetCount("Nodes")
    
      New_c.Timing True 'NID-Deletes will delete the ChildNodes as well (due to Delete Cascade)
        MemDB.Exec "Delete From Nodes Where NID='N_0'" 'delete the just renamed Node (and all its Children)
      Debug.Print "Node-Delete (cascaded)", New_c.Timing, MemDB.GetCount("Nodes")
    
      New_c.Timing True
        Set Rs = GetNodeAndDescendants("N_3")
      Debug.Print "Select Node+Lvl1-Desc.", New_c.Timing, Rs.RecordCount
    End Sub
     
    Private Sub InsertNode(NodeID As String, Optional ParentID As String)
      NodeInsert.SetText 1, NodeID
      NodeInsert.SetText 2, ParentID
      NodeInsert.Execute
    End Sub
    
    Private Function GetNodeAndDescendants(NodeID As String) As cRecordset
      With MemDB.CreateSelectCommand("SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID")
        .SetText !ID, NodeID
        Set GetNodeAndDescendants = .Execute
      End With
    End Function
    The two SQL-Strings which are (IMO) performance-relevant also in your case, I've colored magenta.

    Here's the timings, the above code reports:

    Code:
    Inserts on the Adj-Lst       288.70msec    40200 
    Node-Rename (cascaded)       1.05msec      40200 
    Node-Delete (cascaded)       1.79msec      39999 
    Select Node+Lvl1-Desc.       0.86msec      201
    Same thing again, this time with a commented out Extra-Index-Creation on the ParentID
    Code:
    Inserts on the Adj-Lst       289.92msec    40200 
    Node-Rename (cascaded)       12.58msec     40200 
    Node-Delete (cascaded)       1,602.38msec  39999 
    Select Node+Lvl1-Desc.       4.56msec      201
    Leaving out the Index will (significantly) slow down especially the cascaded Node-Delete,
    though the other timings (albeit factor 5-12 slower) are still bearable.

    Olaf
    Last edited by Schmidt; Feb 8th, 2015 at 07:21 PM.

  5. #5
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    39,043

    Re: linQ - Need to optimize slow subquery code (.contains)

    There are workarounds, and Schmidt has shown a good one, though it may not be practical, either. If you can do the job in SQL when populating the tables, then that will generally be faster than any manipulation afterwards. What I was suggesting is that you have found a situation where the way you have the data organized is slowing you down, so store it a different way that works for that particular problem. There is no reason why data has to be stored only one way. Memory is cheap. I often have dictionaries for looking up one thing based on another, and sometimes I build reverse dictionaries with the same data to speed up a particular sub-case.

    In your case, I would guess that the LINQ being generated builds horrible intermediate tables MANY MANY times over as it iterates through the collections. Exactly how it will handle that scenario I can't say, but based on the slowdown you are seeing, I would say that you have a loop within a loop. Either loop on its own wouldn't be so bad, but combined you multiply the time by the number of iterations in some way that causes the time to explode in a more than exponential fashion. So, there may not be a simple, "do this, and all will be well" kind of solution. It may be that you need to have a different organization of the data that deals with this particular case, whether it be a different SQL query or something else.
    My usual boring signature: Nothing

  6. #6
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    Quote Originally Posted by Shaggy Hiker View Post
    If you can do the job in SQL when populating the tables, then that will generally be faster than any manipulation afterwards.
    Sometimes direct SQL-queries will be slower than "SQL-like" querying from "InMemory-DB-like" DataStructures...
    Server-Roundtrips can be expensive - and well-balanced clientside processing is worthwhile to reduce ServerLoad -
    as demonstrated in modern WebApps with jQuery+Ajax and Browser-internally handled sorting, filtering etc...
    of Data which was selected and transferred from the serverside in less expensive to provide "Raw-ResultSets".

    Quote Originally Posted by Shaggy Hiker View Post
    What I was suggesting is that you have found a situation where the way you have the data organized is slowing you down, so store it a different way that works for that particular problem. There is no reason why data has to be stored only one way. Memory is cheap.
    Cheap memory (along with convenience) is the reason behind .NET-DataTables and SQL-like LINQ-expressions
    which in combination form the same thing as I was using above - an InMemory-DB-scenario, able to buffer
    "Raw-Selects" in a way, which one can later on conveniently perform "Selects, Filtering, Ordering, Joining, Grouping" on
    (in a decent performance).

    Quote Originally Posted by Shaggy Hiker View Post
    I often have dictionaries for looking up one thing based on another, and sometimes I build reverse dictionaries with the same data to speed up a particular sub-case.
    And exactly that should *not* be necessary when the alternative "InMemory-Convenience-Layer" (DataTables + LINQ)
    is well-implemented and supports Indexing (because indexed lookups are, what you perform on a Dictionary as well).

    Quote Originally Posted by Shaggy Hiker View Post
    In your case, I would guess that the LINQ being generated builds horrible intermediate tables MANY MANY times over as it iterates through the collections.
    Well, that's the case with inefficiently formulated "plain SQL" (in conjunction with missing or misplaced indexes) too.
    DataTables+LINQ should be able to perform basically at the same Level as my small SQLite-InMemory example...

    If that is not possible, then I'd not see the point of LINQ-integration (since the amount of Data the OP was
    mentioning (20000 records or so), is still in the lower range, perfectly suited for such InMemory-massaging.

    Perhaps somebody with more LINQ-experience will point out a (much) better performing expression -
    I was just trying to give some hints (especially with regards to indexing also the ParentKey-Column),
    where the problem might be located - and what the timings should come out like, after the problem
    was solved satisfyingly.

    Olaf
    Last edited by Schmidt; Feb 8th, 2015 at 04:48 PM.

  7. #7
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: linQ - Need to optimize slow subquery code (.contains)

    I have never seen a LINQ expression written in such a way. I am no expert with data tables so if you could give me a sample i could try to re-write the expression. PLINQ would also likely be more beneficial here. I would wager a compiled regex pattern would be quicker then contains over large querys but do not quote me on that.

  8. #8

    Thread Starter
    Lively Member
    Join Date
    Jul 2009
    Posts
    85

    Re: linQ - Need to optimize slow subquery code (.contains)

    First, thanks for your inputs. I appreciate it !
    Now I will give just a little more info for the program, before I answer. I think I actually found a good workaround.
    Info:

    1/ The database
    I use a *.mdb file to keep the database. It's perfect for my needs, extrely fast also.
    When the program loads, it will load a dataset.
    The data scheme (tables, relations, primary keys etc..) has been studied to be as efficient as possible. It's a real database (and not a single table with everything in it as we can see sometimes)
    The dataset is actually kind of a copy of the real database file. Works great.

    2/ The Treeview
    In my program, I have a treeview in the left side. On load, this treeview is loaded with its structure.
    To explain better : the first nodes are done by the program itself, but then the user can arrange it the way he wants. He can also drag and drop, making "sub categories".
    In the total, there are 4 kinds of nodes with different properties. For each node, I needed to keep a lot of different data, so I decided to store the treeview structure into a datatable. A typical database has something like 3000 nodes so a datatable was the good choice I think, to save the structure.

    The action
    Ok this is really straightforward : The user click on a treenode to see the files (from the chosen category) in a listview on the right side.

    So when a click is done :
    -a- I query the dataset, with an argument being the "ID" of the treenode.
    -b- The query gives me a list of files.
    -c- I populate the listview. The listview has about 8 different column, but I get everything from my query.

    This usually takes about 100 milliseconds.

    The engeenering
    I think the way I built this is rather "kiss" (keep it small and simple)

    The treenode has nothing to do with the problem I have for the linQ query. Sorry if I wasnt clear at the beginning.
    The treenode is just the way the user choses its ID to show in the listview. Please don't worry about the treeview, I can get the ID's I need in two lines of code and I'm fine with it.


    The recursiveness
    So far, so good.
    I then added an option : the recursive selection. Let's say a user want's to see all files from a category, AND all files from its children.

    I have many choices :
    - Query the dataset for every treenode one by one, starting with the chose node, then going through the childrens (usually there are max 15 sub categories). This is a good option and I can populate the listview each time
    - Do the same and keep the results in some temp variable, then populate the listview
    - Do a better query that will look for everything I need at the same time, and get everything in only one time

    As you can tell, at the moment I chose the third option.


    The problem
    As some of you have mentionned, there are loops.
    But my code is rather straighforward :
    Code:
    dim lnQ = (From data In tblData _
                 Where data.Genre_Id = "1" _
                 OrElse (From genre In tblStructure _
                              Where genre.parent_Id = "1" _
                           Select genre.Id).Contains(data.Genre_Id) _
                 Select data).ToList
    First there's a subquery : It will get a simple list of the children ID's. This is extremly fast.
    Second, the query for the files to populate : it check if the row has the corresponding Genre_ID, if not, another chance, it checks whether its Genre_Id is one of the childrens ID.

    Where are the loops ? Well there are being created when the compiler changes the linQ into real SQL.
    .Contains() == Where In

    I've read that this cause the compiler to do something like "where in (a) or (b) or (c) or (n...)"
    So, I guess for every row in the datatable that meet the first where clause, they are then compared to every options in the .Contains() extention, one by one. I dont know how and why, but there are loops at this moment.

    This is what gives the 20 seconds for this particular query.


    Some solutions
    First I though I would change a little this query to put it this way :
    Code:
    dim subquery = (From genre In tblStructure _
                                Where genre.Id = "1" _
                                OrElse genre.parent_Id = "1" _
                            Select genre.Id)
    
    dim lnQ = (From data In tblData _
                      Where subquery.Contains(data.Genre_Id) _
                 Select data)
    It works, results are correct.
    I just splitted the main query from the subquery.
    I added the main treenode ID directly with it's children ID's (so that it only compare once instead of 2x in the previous version)
    Impact on the query : NONE !
    Same time.
    This is where I understood the subquery is just fine. Debugging it shows me that it holds on the second query (where contains....)


    What's next ?
    I then though about going back to old time SELECT() method from the datatables.
    I'm sure I can do it.

    But I needed to try something before :

    Code:
    dim lnQ = (From data In tblData _
                                Where genre.Id = "1" _
                                  OrElse genre.Id = "13" _
                                  OrElse genre.Id = "15" _
                                  OrElse genre.Id = "17" _
                                  OrElse genre.Id = "18" _
                                  OrElse genre.Id = "24" _
                                  OrElse genre.Id = "25" _
                                  OrElse genre.Id = "27" _
                     Select genre.Id)
    Bam. Works perfectly, with less than a second.
    So, either I go with something like a tblData.Select("...") and I build the corresponding query here for the dataset

    or... I keep searching.


    Final stage : Regex !
    LinQ is just too good not to use it.
    Ok, the subqueries are really bad. The only solution I found on forums and internet was this .Contains() method, which actually seems logical, if only it wasnt so slow.

    Then I thought about regex.
    I love regex.

    This is what I came with :

    Code:
    lnQ = (From data In tblData.AsEnumerable() _
                    Where Regex.IsMatch(data.Genre_Id.ToString, strIds) _
              Select data)
    Of course, you understood that prior to that, I have built a "strIds" variable with a valid regex inside.
    I simply use the same subquery as shows above (to get a list of ID's) and "flatter" them into a regex like this :
    Code:
    strIds = "^(1|13|15|17|18|24|25|27)$"

    The solution
    This new query is done in about 0.0.250
    This is quite what I was looking for. From 20 seconds to 250 milliseconds : good.

    The regex does exactly he same as the .Contains()
    It just checks wheter the datarow.Genre_ID correspond to one of the possibles ID's

    But, it does it in a much better way with no loops.
    I love regex.
    Ans now, I know I can use regex in linQ !


    Conclusion

    There are some other possibilities I didnt really investigate.
    Sorry if I didnt really look for the UNION ALL option. Also I've read that one of the JOIN method could be a solution.

    Right now, I just needed to check the datatable if the rows.Genre_Id would correspond to one of the possible ID's, and the regex thing is just great.

    If anyone have another good solution, I always open to other possibilities.

    Sorry for the long answer, but i needed to explain. Hopefully this can help someone in the futur.
    And thanks for helping you guys made me think about it quite hard

  9. #9
    Frenzied Member
    Join Date
    Apr 2012
    Posts
    1,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    For a two-tier hierarchy, surely you don't need the UNION at all?

    i.e.

    "SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID"

    is the same as

    "SELECT * FROM Nodes Where NID=@ID OR PID=@ID"
    If you don't know where you're going, any road will take you there...

    My VB6 love-children: Vee-Hive and Vee-Launcher

  10. #10
    Bad man! ident's Avatar
    Join Date
    Mar 2009
    Location
    Cambridge
    Posts
    5,398

    Re: linQ - Need to optimize slow subquery code (.contains)

    An issue is you use LINQ but don't actually use it. Whats with all them else statements? If you really wanted to abuse LINQ you could write it some thing like(example), no need for a select statement.

    vb Code:
    1. Dim query = (Enumerable.Range(0, 20).Select(Function(n) New With {.Id = n}).ToList).Where(Function(f) {1, 13, 15, 17, 18, 24, 25, 27}.Contains(f.Id)).AsParallel

  11. #11
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    Quote Originally Posted by ColinE66 View Post
    For a two-tier hierarchy, surely you don't need the UNION at all?

    i.e.

    "SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID"

    is the same as

    "SELECT * FROM Nodes Where NID=@ID OR PID=@ID"
    Yep, that's right - at least for the simplified example (with only one table) ...
    I guess with the Union approach I had the OPs problem in mind, which spanned 2 tables
    (a Union making it always easier for a Query-Optimizer, to choose the right Index for the partial Select -
    the LINQ-optimizer perhaps being no exception in this regard)

    An adaption on the OPs side (still expressed in SQL not in LINQ) could have looked like:

    Select * From data Where Genre_id=@ID
    Union All
    Select * From data Where Genre_id In (Select id From genre Where Parent_id=@ID)

    Of course also the above could be expressed without a Union, using OR -
    but e.g. SQLite (at least in earlier versions) could only apply 1 properly
    choosen index per Table-Select - and the common workaround was
    therefore always, to use a Union instead of ORs.

    I assume, that the current SQLite-versions will have no problems with a directly
    OR combined result which involves more than one table (and table-index) -
    a concrete test with two tables, similar to the scenario of the OP, might be helpful
    to shed some more light on the current behaviour of SQLite in that regard...

    Olaf

  12. #12
    Frenzied Member
    Join Date
    Apr 2012
    Posts
    1,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    My instinct tells me that, if NID and PID are both indexed, the example without the UNION would be quicker. Not tested, though...and off to bed. Have no idea how Sqlite might have optimized it in earlier versions: My instinct is based on past experience with Oracle and Sybase...
    If you don't know where you're going, any road will take you there...

    My VB6 love-children: Vee-Hive and Vee-Launcher

  13. #13
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    Quote Originally Posted by ColinE66 View Post
    My instinct tells me that, if NID and PID are both indexed, the example without the UNION would be quicker. Not tested, though...and off to bed. Have no idea how Sqlite might have optimized it in earlier versions: My instinct is based on past experience with Oracle and Sybase...
    Just for completeness (and for those who might be interested in experimenting with the .NET-lib
    from sqlite.org (system.data.sqlite - which should offer comparable InMemory behaviour)...

    I've adapted the example more to the OPs problem now, but performance is roughly the same as before:

    Code:
    Inserts on Data+Genre        344.78msec    40200         40200 
    Node-Rename (cascaded)       1.21msec      40200         40200 
    Node-Delete (cascaded)       3.42msec      39999         39999   
    Select Node+Lvl1-Desc.       1.15msec      201
    The time for the initial Inserts is now higher, since I had to add 40200 records into each
    of the two tables (Data and Genre) - both now coming with a higher ColumnCount as well.

    The rest of the timings is basically as in the Single-Table example (Node-Deletes now
    taking a bit longer, since deleting a Node now cascades across two tables with twice
    the amount of records to delete).

    The Select across two tables shows no real difference in performance here with SQLite, no matter
    if I write it as a Union or with a simple OR... (timing's pretty fast, varying between 0.8msec and 1.5msec)

    Below is the updated VB6-Code:
    Code:
    Private Const NperLvl& = 200 '<- adjust the Nodes per Level here
    
    Private i&, j&, MemDB As New cMemDB, GenreInsert As cCommand, DataInsert As cCommand, Rs As cRecordset
    
    Private Sub Form_Load()
      MemDB.Exec "PRAGMA Foreign_Keys=ON" 'enable Foreign-Keys on this InMemory-DB
      
      MemDB.Exec "Create Table Data(  ID Text Primary Key," & _
                                 "    Genre_ID Text References Genre On Update Cascade On Delete Cascade," & _
                                 "    FileName Text" & _
                                 ") Without RowiD"
        
      MemDB.Exec "Create Table Genre( ID Text Primary Key," & _
                                 "    Parent_ID Text References Genre On Update Cascade On Delete Cascade," & _
                                 "    GenreData Text" & _
                                 ") Without RowiD"
      
      'create Insert-Commands for the two tables
      Set DataInsert = MemDB.CreateCommand(" Insert Into Data  Values(?,?,?)")
      Set GenreInsert = MemDB.CreateCommand("Insert Into Genre Values(?,?,?)")
     
      New_c.Timing True
        MemDB.BeginTrans 'ensure a bit more speed for the bulk-insert, by transaction-wrapping
          For i = 1 To NperLvl:    InsertNode "N_" & i 'ParentNodes
             For j = 1 To NperLvl: InsertNode "NC_" & i & "_" & j, "N_" & i 'ChildNodes
          Next j, i
        MemDB.CommitTrans
      Debug.Print "Inserts on Data+Genre", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
      
      'set additional Indexes on Genre.Parent_ID and Data.Genre_ID before we proceed
      MemDB.Exec "Create Index Idx_Genre_Parent_ID On Genre(Parent_ID)"
      MemDB.Exec "Create Index Idx_Data_Genre_ID   On Data (Genre_ID)"
    
      New_c.Timing True 'Genre.ID-Updates will affect the ChildNodes too (due to Update Cascade, and in both Tables)
        MemDB.Exec "Update Genre Set ID='N_0' Where ID='N_1'"
      Debug.Print "Node-Rename (cascaded)", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
     
      New_c.Timing True 'Genre.ID-Deletes will delete the ChildNodes as well (due to Delete Cascade, and in both Tables)
        MemDB.Exec "Delete From Genre Where ID='N_0'" 'delete the just renamed Node (and all its Children)
      Debug.Print "Node-Delete (cascaded)", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
     
      New_c.Timing True
        Set Rs = GetDataNodeAndDescendants("N_3")
      Debug.Print "Select Node+Lvl1-Desc.", New_c.Timing, Rs.RecordCount
    End Sub
     
    Private Sub InsertNode(NodeID As String, Optional ParentID As String)
      GenreInsert.SetText 1, NodeID
      GenreInsert.SetText 2, ParentID
      GenreInsert.SetText 3, "Genre Data for " & NodeID
      GenreInsert.Execute
      
      DataInsert.SetText 1, "DataID_" & NodeID
      DataInsert.SetText 2, NodeID
      DataInsert.SetText 3, "FileName for " & "DataID_" & NodeID
      DataInsert.Execute
    End Sub
    
    Private Function GetDataNodeAndDescendants(NodeID As String) As cRecordset
    Const SQL$ = "Select * From Data Where Genre_ID=@ID OR Genre_ID In (Select ID From Genre Where Parent_ID=@ID)"
    'Const SQL$ = "Select * From Data Where Genre_ID=@ID Union All Select * From Data Where Genre_ID In (Select ID From Genre Where Parent_ID=@ID)"
      With MemDB.CreateSelectCommand(SQL)
        .SetText !ID, NodeID
        Set GetDataNodeAndDescendants = .Execute
      End With
    End Function
    Olaf

  14. #14

    Thread Starter
    Lively Member
    Join Date
    Jul 2009
    Posts
    85

    Re: linQ - Need to optimize slow subquery code (.contains)

    Quote Originally Posted by ident View Post
    An issue is you use LINQ but don't actually use it. Whats with all them else statements? If you really wanted to abuse LINQ you could write it some thing like(example), no need for a select statement.
    The query with all the Orelse ??
    I dont use this code actually it was just for a quick test.

    I needed to check if the problem was coming from the .Contains() method as many forum explain this method is really bad in linQ for upper databases. It's fine for small databases. (Mine is 20.000 at the moment but will go up to 50k or more for some users)


    vb Code:
    1. Dim query = (Enumerable.Range(0, 20).Select(Function(n) New With {.Id = n}).ToList).Where(Function(f) {1, 13, 15, 17, 18, 24, 25, 27}.Contains(f.Id)).AsParallel
    Ok this is a bit more complicated, would have to test and understant this query
    But so far I can tell, it's using a .Contains method that I'm trying to avoid now from my queries.

  15. #15

    Thread Starter
    Lively Member
    Join Date
    Jul 2009
    Posts
    85

    Re: linQ - Need to optimize slow subquery code (.contains)

    For the other codes I'll have to test also ! Thanks for the source code Olaf, I need some time to see how it performs.

    For the Union All, maybe I'll try to convert it to linQ, so directly usable into a vb.net code

  16. #16
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    Quote Originally Posted by castaway View Post
    For the other codes I'll have to test also ! Thanks for the source code Olaf, I need some time to see how it performs.

    For the Union All, maybe I'll try to convert it to linQ, so directly usable into a vb.net code
    As for the (VB6)-Sources I've posted, one should be able to translate the example(s) to
    .NET-syntax without larger efforts, using the appropriate .NET-wrapper-classes for this
    DB-Engine, which are either available via:
    http://system.data.sqlite.org

    or alternatively as a more "hazzle-free-install" per NuGet-package via:
    https://www.nuget.org/packages/System.Data.SQLite/

    An InMemory-DB-Connection, which has Foreign Keys enabled from the get-go can be created this way:
    Code:
    var Cnn = new SQLiteConnection("Data Source=:memory:;Foreign Keys=True");
    CommandObject-Syntax might differ in a few details, but all in all it should be
    well-portable, keeping more or less the same code-structure.

    An attempt to use System.Data.SQLite.Linq directly might be worthwhile as well:
    http://vijayt.com/Post/Using-SQLite-...h-LINQ-to-SQL-

    Would be interested to hear, how the .NET-SQLite-wrapper compares (timing-wise)
    to the COM-based one I've used above.

    Olaf

  17. #17

    Thread Starter
    Lively Member
    Join Date
    Jul 2009
    Posts
    85

    Re: linQ - Need to optimize slow subquery code (.contains)

    Hi

    The file database I'm using is not SqLite but *.mdb (see post#8).
    When I first started this project, long time ago, I choose *.mdb instead of *.sqlite because I already had *.mdb file and stored procedures knowledge.

    Sqlite has done a long way since then, and maybe I'll try also to use it sometimes !

  18. #18
    PowerPoster
    Join Date
    Jun 2013
    Posts
    7,255

    Re: linQ - Need to optimize slow subquery code (.contains)

    Quote Originally Posted by castaway View Post
    The file database I'm using is not SqLite but *.mdb (see post#8).
    I'm entirely aware of that... (and wasn't really suggesting a switch from
    your *.mdb to an sqlite-db-file ... not at this point at least...).

    What I was trying to show was, that your "tree-like-scenario" (with most
    of your *.mdb-Tables being replicated already In-Memory by DataTables,
    then queried InMemory per LINQ), could be addressed also by an alternative
    InMemory-scenario which was based on an SQLite-InMemory-DB...

    Then using normal SQL (in SQLite -flavour) to query these alternative
    InMemory-(TreeLike)Structures in a better performance...

    What you currently have (or do) is (in case I understood correctly):

    1) <Perform Raw-Selects on the *.mdb-File, using the appropriate methods provided by a DataSet>
    ...
    2) <These full-table-selects end up as an InMemory-copy (a DataTable) and will be hosted there>
    ...
    3) <Query these DataTables per LINQ>


    And as I read it in your last posts, you're in the meantime down to ~250msec (from the former 25secs you started with)


    The scenario with SQLite as your "alternative InMemory-DB-Storage" would work this way:

    1) <Perform Raw-Selects on the *.mdb-File, transfer them directly into SQLite-InMemory-DB-Tables>
    ...
    2) <These table-transfers perform best when using a DataReader(e.g. OleDbDataReader) in conjunction with an SQLite-CommandObject>
    ...
    3) <Query the SQLite-InMemory-Tables either directly per "SQLite-SQL" - or per System.Data.SQLite.Linq>


    - That approach wouldn't need more Memory, compared with .NET DataSets/DataTables
    ..(I'd guess it's roughly half, because SQLite by default stores Strings as UTF8 internally, not as WChars)

    - And it will perform similar to the results I've posted above - needing roughly 0.5msec instead
    ..of your current 250msec - which is factor 500 - and still food for thought, on what's wrong with
    ..LINQ (or DataTables) in this scenario.

    Olaf

  19. #19

    Thread Starter
    Lively Member
    Join Date
    Jul 2009
    Posts
    85

    Re: linQ - Need to optimize slow subquery code (.contains)

    Oh wow,
    There are many alternatives indeed

    Ok I will put all of this on the side for now. I will be a bit busy for work for a couple of weeks but I have taken note of all those options.
    But I'll try it on my own when time will be here of me.

    Thanks a lot for the help, I'll come back to this thread for any interesting news

    Cast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width