linQ - Need to optimize slow subquery code (.contains)

**castaway** · Feb 7th, 2015, 12:21 PM

Hello,

I'm using this code to get a subquery :

Code:

dim lnQ = (From data In tblData _
             Where data.Genre_Id = "1" _
             OrElse (From genre In tblStructure _
                          Where genre.parent_Id = "1" _
                     Select genre.Id).Contains(data.Genre_Id) _
             Select data).ToList

Basically, I have 2 datatables

- The first is a Treeview structure

Code:

Category (.id="1")
  |- Genre a (.id="13")
  |- Genre b (.id="15")
  |- ...
  |- Genre n

- The second keeps the data (20.000 rows)

I need to pull "all rows from the category, including all rows from it's subcategories"

My code works, but it has dramatic results (20/30 seconds to get it done), when I need an instant response. I know the .Contains is really bad.

What would be a good fix to totally get rid of the .contains ???

Thanks in advance!

**Shaggy Hiker** · Feb 7th, 2015, 03:32 PM

Don't use LINQ?

LINQ is slow, in general, but the fundamental solution is to reconsider the problem. I would guess that the "1" you are using is just an example, or else there isn't any point in the LINQ, but one thing you might consider is putting the tblStructure into a Dictionary with the parent ID as the key. Finding out whether or not a Dicitonary holds a key is VERY fast compared to what you are doing. Whether or not this will work in your case is harder to say.

**castaway** · Feb 7th, 2015, 08:50 PM

Thanks for your input.

Yes the "1" is an exemple indeed. The project is of course much more complex but I just changed it make it easier for here. (But the nodes from the tree structure really holds a unique Id, so that when the user chose one node I already have the ID).

I know that linQ is slow sometimes, but I guess it's more because of the programmer (me). I mean, most of the other queries do the job in 0.0.100, so 100 milliseconds (for the same exact database with 20.000 rows). That's pretty good.

Just this specific query, with .Contains is a disaster.
I'm sure there are some good workaround to optimize it.

Thanks for the dictionnary option, but the problem is not when querying the table with the structure, but rather the main table in the database. I have no choice of doing a query on it (there are about 10 datatables with relations, primary keys etc..).

In other words, the subquery is quite fast, but the main query (with .contains) is the problem. I tryed them both seperatly..

**Schmidt** · Feb 8th, 2015, 02:34 PM

Originally Posted by castaway

Just this specific query, with .Contains is a disaster.
I'm sure there are some good workaround to optimize it.

Can't advise you about a better LINQ-Statement, but in case it is the
data-record of the Parent-Node you're after, followed by the data-records
of its first-level-descendants, then (in SQL) a simple UNION ALL would suffice...
(there should be an equivalent for a UNION in LINQ-syntax).

Also not sure, if you already have an index in place on your ParentID-Field(s) -
since that usually speeds up things as well.

Just played that through here with an SQLite-based InMemory-DB, to give you
an impression about the timings, such things usually (should) take.

The following is VB6-code - but from the Create Table Stament and the used SQL
you should be able to deduce what I was doing on this simple Adjacency-List-example...
(which is two levels deep: 200ParentNodes with 200ChildNodes each = 40200 records total) :

Code:

Option Explicit

Private Const NperLvl& = 200 '<- adjust the Nodes per Level here

Private i&, j&, MemDB As New cMemDB, NodeInsert As cCommand, Rs As cRecordset
 
Private Sub Form_Load()
  MemDB.Exec "PRAGMA Foreign_Keys=ON" 'enable Foreign-Keys on this InMemory-DB
  
  MemDB.Exec "Create Table Nodes(   NID Text Primary Key," & _
                               "    PID Text References Nodes On Update Cascade On Delete Cascade" & _
                               ") Without RowID"
                               
  'Insert example-data into the Adjacency-List we created above
  Set NodeInsert = MemDB.CreateCommand("Insert Into Nodes Values(?,?)")
  New_c.Timing True
    For i = 1 To NperLvl:    InsertNode "N_" & i 'ParentNodes
       For j = 1 To NperLvl: InsertNode "NC_" & i & "_" & j, "N_" & i 'ChildNodes
    Next j, i
  Debug.Print "Inserts on the Adj-Lst", New_c.Timing, MemDB.GetCount("Nodes")
  
  'set an additional Index on the ParentID, before we proceed
  MemDB.Exec "Create Index Idx_Nodes_PID On Nodes(PID)"

  New_c.Timing True 'NID-Updates will affect the ChildNodes PID too (due to Update Cascade in our Table-Def)
    MemDB.Exec "Update Nodes Set NID='N_0' Where NID='N_1'"
  Debug.Print "Node-Rename (cascaded)", New_c.Timing, MemDB.GetCount("Nodes")

  New_c.Timing True 'NID-Deletes will delete the ChildNodes as well (due to Delete Cascade)
    MemDB.Exec "Delete From Nodes Where NID='N_0'" 'delete the just renamed Node (and all its Children)
  Debug.Print "Node-Delete (cascaded)", New_c.Timing, MemDB.GetCount("Nodes")

  New_c.Timing True
    Set Rs = GetNodeAndDescendants("N_3")
  Debug.Print "Select Node+Lvl1-Desc.", New_c.Timing, Rs.RecordCount
End Sub
 
Private Sub InsertNode(NodeID As String, Optional ParentID As String)
  NodeInsert.SetText 1, NodeID
  NodeInsert.SetText 2, ParentID
  NodeInsert.Execute
End Sub

Private Function GetNodeAndDescendants(NodeID As String) As cRecordset
  With MemDB.CreateSelectCommand("SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID")
    .SetText !ID, NodeID
    Set GetNodeAndDescendants = .Execute
  End With
End Function

The two SQL-Strings which are (IMO) performance-relevant also in your case, I've colored magenta.

Here's the timings, the above code reports:

Code:

Inserts on the Adj-Lst       288.70msec    40200 
Node-Rename (cascaded)       1.05msec      40200 
Node-Delete (cascaded)       1.79msec      39999 
Select Node+Lvl1-Desc.       0.86msec      201

Same thing again, this time with a commented out Extra-Index-Creation on the ParentID

Code:

Inserts on the Adj-Lst       289.92msec    40200 
Node-Rename (cascaded)       12.58msec     40200 
Node-Delete (cascaded)       1,602.38msec  39999 
Select Node+Lvl1-Desc.       4.56msec      201

Leaving out the Index will (significantly) slow down especially the cascaded Node-Delete,
though the other timings (albeit factor 5-12 slower) are still bearable.

Olaf

**Shaggy Hiker** · Feb 8th, 2015, 03:44 PM

There are workarounds, and Schmidt has shown a good one, though it may not be practical, either. If you can do the job in SQL when populating the tables, then that will generally be faster than any manipulation afterwards. What I was suggesting is that you have found a situation where the way you have the data organized is slowing you down, so store it a different way that works for that particular problem. There is no reason why data has to be stored only one way. Memory is cheap. I often have dictionaries for looking up one thing based on another, and sometimes I build reverse dictionaries with the same data to speed up a particular sub-case.

In your case, I would guess that the LINQ being generated builds horrible intermediate tables MANY MANY times over as it iterates through the collections. Exactly how it will handle that scenario I can't say, but based on the slowdown you are seeing, I would say that you have a loop within a loop. Either loop on its own wouldn't be so bad, but combined you multiply the time by the number of iterations in some way that causes the time to explode in a more than exponential fashion. So, there may not be a simple, "do this, and all will be well" kind of solution. It may be that you need to have a different organization of the data that deals with this particular case, whether it be a different SQL query or something else.

**Schmidt** · Feb 8th, 2015, 04:35 PM

Originally Posted by Shaggy Hiker

If you can do the job in SQL when populating the tables, then that will generally be faster than any manipulation afterwards.

Sometimes direct SQL-queries will be slower than "SQL-like" querying from "InMemory-DB-like" DataStructures...
Server-Roundtrips can be expensive - and well-balanced clientside processing is worthwhile to reduce ServerLoad -
as demonstrated in modern WebApps with jQuery+Ajax and Browser-internally handled sorting, filtering etc...
of Data which was selected and transferred from the serverside in less expensive to provide "Raw-ResultSets".

Originally Posted by Shaggy Hiker

What I was suggesting is that you have found a situation where the way you have the data organized is slowing you down, so store it a different way that works for that particular problem. There is no reason why data has to be stored only one way. Memory is cheap.

Cheap memory (along with convenience) is the reason behind .NET-DataTables and SQL-like LINQ-expressions
which in combination form the same thing as I was using above - an InMemory-DB-scenario, able to buffer
"Raw-Selects" in a way, which one can later on conveniently perform "Selects, Filtering, Ordering, Joining, Grouping" on
(in a decent performance).

Originally Posted by Shaggy Hiker

I often have dictionaries for looking up one thing based on another, and sometimes I build reverse dictionaries with the same data to speed up a particular sub-case.

And exactly that should *not* be necessary when the alternative "InMemory-Convenience-Layer" (DataTables + LINQ)
is well-implemented and supports Indexing (because indexed lookups are, what you perform on a Dictionary as well).

Originally Posted by Shaggy Hiker

In your case, I would guess that the LINQ being generated builds horrible intermediate tables MANY MANY times over as it iterates through the collections.

Well, that's the case with inefficiently formulated "plain SQL" (in conjunction with missing or misplaced indexes) too.
DataTables+LINQ should be able to perform basically at the same Level as my small SQLite-InMemory example...

If that is not possible, then I'd not see the point of LINQ-integration (since the amount of Data the OP was
mentioning (20000 records or so), is still in the lower range, perfectly suited for such InMemory-massaging.

Perhaps somebody with more LINQ-experience will point out a (much) better performing expression -
I was just trying to give some hints (especially with regards to indexing also the ParentKey-Column),
where the problem might be located - and what the timings should come out like, after the problem
was solved satisfyingly.

Olaf

**ident** · Feb 8th, 2015, 04:38 PM

I have never seen a LINQ expression written in such a way. I am no expert with data tables so if you could give me a sample i could try to re-write the expression. PLINQ would also likely be more beneficial here. I would wager a compiled regex pattern would be quicker then contains over large querys but do not quote me on that.

**castaway** · Feb 8th, 2015, 07:27 PM

First, thanks for your inputs. I appreciate it !
Now I will give just a little more info for the program, before I answer. I think I actually found a good workaround.
Info:

1/ The database
I use a *.mdb file to keep the database. It's perfect for my needs, extrely fast also.
When the program loads, it will load a dataset.
The data scheme (tables, relations, primary keys etc..) has been studied to be as efficient as possible. It's a real database (and not a single table with everything in it as we can see sometimes)
The dataset is actually kind of a copy of the real database file. Works great.

2/ The Treeview
In my program, I have a treeview in the left side. On load, this treeview is loaded with its structure.
To explain better : the first nodes are done by the program itself, but then the user can arrange it the way he wants. He can also drag and drop, making "sub categories".
In the total, there are 4 kinds of nodes with different properties. For each node, I needed to keep a lot of different data, so I decided to store the treeview structure into a datatable. A typical database has something like 3000 nodes so a datatable was the good choice I think, to save the structure.

The action
Ok this is really straightforward : The user click on a treenode to see the files (from the chosen category) in a listview on the right side.

So when a click is done :
-a- I query the dataset, with an argument being the "ID" of the treenode.
-b- The query gives me a list of files.
-c- I populate the listview. The listview has about 8 different column, but I get everything from my query.

This usually takes about 100 milliseconds.

The engeenering
I think the way I built this is rather "kiss" (keep it small and simple)

The treenode has nothing to do with the problem I have for the linQ query. Sorry if I wasnt clear at the beginning.
The treenode is just the way the user choses its ID to show in the listview. Please don't worry about the treeview, I can get the ID's I need in two lines of code and I'm fine with it.

The recursiveness
So far, so good.
I then added an option : the recursive selection. Let's say a user want's to see all files from a category, AND all files from its children.

I have many choices :
- Query the dataset for every treenode one by one, starting with the chose node, then going through the childrens (usually there are max 15 sub categories). This is a good option and I can populate the listview each time
- Do the same and keep the results in some temp variable, then populate the listview
- Do a better query that will look for everything I need at the same time, and get everything in only one time

As you can tell, at the moment I chose the third option.

The problem
As some of you have mentionned, there are loops.
But my code is rather straighforward :

Code:

dim lnQ = (From data In tblData _
             Where data.Genre_Id = "1" _
             OrElse (From genre In tblStructure _
                          Where genre.parent_Id = "1" _
                       Select genre.Id).Contains(data.Genre_Id) _
             Select data).ToList

First there's a subquery : It will get a simple list of the children ID's. This is extremly fast.
Second, the query for the files to populate : it check if the row has the corresponding Genre_ID, if not, another chance, it checks whether its Genre_Id is one of the childrens ID.

Where are the loops ? Well there are being created when the compiler changes the linQ into real SQL.
.Contains() == Where In

I've read that this cause the compiler to do something like "where in (a) or (b) or (c) or (n...)"
So, I guess for every row in the datatable that meet the first where clause, they are then compared to every options in the .Contains() extention, one by one. I dont know how and why, but there are loops at this moment.

This is what gives the 20 seconds for this particular query.

Some solutions
First I though I would change a little this query to put it this way :

Code:

dim subquery = (From genre In tblStructure _
                            Where genre.Id = "1" _
                            OrElse genre.parent_Id = "1" _
                        Select genre.Id)

dim lnQ = (From data In tblData _
                  Where subquery.Contains(data.Genre_Id) _
             Select data)

It works, results are correct.
I just splitted the main query from the subquery.
I added the main treenode ID directly with it's children ID's (so that it only compare once instead of 2x in the previous version)
Impact on the query : NONE !
Same time.
This is where I understood the subquery is just fine. Debugging it shows me that it holds on the second query (where contains....)

What's next ?
I then though about going back to old time SELECT() method from the datatables.
I'm sure I can do it.

But I needed to try something before :

Code:

dim lnQ = (From data In tblData _
                            Where genre.Id = "1" _
                              OrElse genre.Id = "13" _
                              OrElse genre.Id = "15" _
                              OrElse genre.Id = "17" _
                              OrElse genre.Id = "18" _
                              OrElse genre.Id = "24" _
                              OrElse genre.Id = "25" _
                              OrElse genre.Id = "27" _
                 Select genre.Id)

Bam. Works perfectly, with less than a second.
So, either I go with something like a tblData.Select("...") and I build the corresponding query here for the dataset

or... I keep searching.

Final stage : Regex !
LinQ is just too good not to use it.
Ok, the subqueries are really bad. The only solution I found on forums and internet was this .Contains() method, which actually seems logical, if only it wasnt so slow.

Then I thought about regex.
I love regex.

This is what I came with :

Code:

lnQ = (From data In tblData.AsEnumerable() _
                Where Regex.IsMatch(data.Genre_Id.ToString, strIds) _
          Select data)

Of course, you understood that prior to that, I have built a "strIds" variable with a valid regex inside.
I simply use the same subquery as shows above (to get a list of ID's) and "flatter" them into a regex like this :

Code:

strIds = "^(1|13|15|17|18|24|25|27)$"

The solution
This new query is done in about 0.0.250
This is quite what I was looking for. From 20 seconds to 250 milliseconds : good.

The regex does exactly he same as the .Contains()
It just checks wheter the datarow.Genre_ID correspond to one of the possibles ID's

But, it does it in a much better way with no loops.
I love regex.
Ans now, I know I can use regex in linQ !

Conclusion

There are some other possibilities I didnt really investigate.
Sorry if I didnt really look for the UNION ALL option. Also I've read that one of the JOIN method could be a solution.

Right now, I just needed to check the datatable if the rows.Genre_Id would correspond to one of the possible ID's, and the regex thing is just great.

If anyone have another good solution, I always open to other possibilities.

Sorry for the long answer, but i needed to explain. Hopefully this can help someone in the futur.
And thanks for helping you guys made me think about it quite hard

**ColinE66** · Feb 8th, 2015, 07:30 PM

For a two-tier hierarchy, surely you don't need the UNION at all?

i.e.

"SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID"

is the same as

"SELECT * FROM Nodes Where NID=@ID OR PID=@ID"

**ident** · Feb 8th, 2015, 08:11 PM

An issue is you use LINQ but don't actually use it. Whats with all them else statements? If you really wanted to abuse LINQ you could write it some thing like(example), no need for a select statement.

vb Code:

Dim query = (Enumerable.Range(0, 20).Select(Function(n) New With {.Id = n}).ToList).Where(Function(f) {1, 13, 15, 17, 18, 24, 25, 27}.Contains(f.Id)).AsParallel

**Schmidt** · Feb 8th, 2015, 08:33 PM

Originally Posted by ColinE66

For a two-tier hierarchy, surely you don't need the UNION at all?

i.e.

"SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID"

is the same as

"SELECT * FROM Nodes Where NID=@ID OR PID=@ID"

Yep, that's right - at least for the simplified example (with only one table) ...
I guess with the Union approach I had the OPs problem in mind, which spanned 2 tables
(a Union making it always easier for a Query-Optimizer, to choose the right Index for the partial Select -
the LINQ-optimizer perhaps being no exception in this regard)

An adaption on the OPs side (still expressed in SQL not in LINQ) could have looked like:

Select * From data Where Genre_id=@ID
Union All
Select * From data Where Genre_id In (Select id From genre Where Parent_id=@ID)

Of course also the above could be expressed without a Union, using OR -
but e.g. SQLite (at least in earlier versions) could only apply 1 properly
choosen index per Table-Select - and the common workaround was
therefore always, to use a Union instead of ORs.

I assume, that the current SQLite-versions will have no problems with a directly
OR combined result which involves more than one table (and table-index) -
a concrete test with two tables, similar to the scenario of the OP, might be helpful
to shed some more light on the current behaviour of SQLite in that regard...

Olaf

**ColinE66** · Feb 8th, 2015, 09:02 PM

My instinct tells me that, if NID and PID are both indexed, the example without the UNION would be quicker. Not tested, though...and off to bed. Have no idea how Sqlite might have optimized it in earlier versions: My instinct is based on past experience with Oracle and Sybase...

**Schmidt** · Feb 9th, 2015, 09:47 AM

Originally Posted by ColinE66

My instinct tells me that, if NID and PID are both indexed, the example without the UNION would be quicker. Not tested, though...and off to bed. Have no idea how Sqlite might have optimized it in earlier versions: My instinct is based on past experience with Oracle and Sybase...

Just for completeness (and for those who might be interested in experimenting with the .NET-lib
from sqlite.org (system.data.sqlite - which should offer comparable InMemory behaviour)...

I've adapted the example more to the OPs problem now, but performance is roughly the same as before:

Code:

Inserts on Data+Genre        344.78msec    40200         40200 
Node-Rename (cascaded)       1.21msec      40200         40200 
Node-Delete (cascaded)       3.42msec      39999         39999   
Select Node+Lvl1-Desc.       1.15msec      201

The time for the initial Inserts is now higher, since I had to add 40200 records into each
of the two tables (Data and Genre) - both now coming with a higher ColumnCount as well.

The rest of the timings is basically as in the Single-Table example (Node-Deletes now
taking a bit longer, since deleting a Node now cascades across two tables with twice
the amount of records to delete).

The Select across two tables shows no real difference in performance here with SQLite, no matter
if I write it as a Union or with a simple OR... (timing's pretty fast, varying between 0.8msec and 1.5msec)

Below is the updated VB6-Code:

Code:

Private Const NperLvl& = 200 '<- adjust the Nodes per Level here

Private i&, j&, MemDB As New cMemDB, GenreInsert As cCommand, DataInsert As cCommand, Rs As cRecordset

Private Sub Form_Load()
  MemDB.Exec "PRAGMA Foreign_Keys=ON" 'enable Foreign-Keys on this InMemory-DB
  
  MemDB.Exec "Create Table Data(  ID Text Primary Key," & _
                             "    Genre_ID Text References Genre On Update Cascade On Delete Cascade," & _
                             "    FileName Text" & _
                             ") Without RowiD"
    
  MemDB.Exec "Create Table Genre( ID Text Primary Key," & _
                             "    Parent_ID Text References Genre On Update Cascade On Delete Cascade," & _
                             "    GenreData Text" & _
                             ") Without RowiD"
  
  'create Insert-Commands for the two tables
  Set DataInsert = MemDB.CreateCommand(" Insert Into Data  Values(?,?,?)")
  Set GenreInsert = MemDB.CreateCommand("Insert Into Genre Values(?,?,?)")
 
  New_c.Timing True
    MemDB.BeginTrans 'ensure a bit more speed for the bulk-insert, by transaction-wrapping
      For i = 1 To NperLvl:    InsertNode "N_" & i 'ParentNodes
         For j = 1 To NperLvl: InsertNode "NC_" & i & "_" & j, "N_" & i 'ChildNodes
      Next j, i
    MemDB.CommitTrans
  Debug.Print "Inserts on Data+Genre", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
  
  'set additional Indexes on Genre.Parent_ID and Data.Genre_ID before we proceed
  MemDB.Exec "Create Index Idx_Genre_Parent_ID On Genre(Parent_ID)"
  MemDB.Exec "Create Index Idx_Data_Genre_ID   On Data (Genre_ID)"

  New_c.Timing True 'Genre.ID-Updates will affect the ChildNodes too (due to Update Cascade, and in both Tables)
    MemDB.Exec "Update Genre Set ID='N_0' Where ID='N_1'"
  Debug.Print "Node-Rename (cascaded)", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
 
  New_c.Timing True 'Genre.ID-Deletes will delete the ChildNodes as well (due to Delete Cascade, and in both Tables)
    MemDB.Exec "Delete From Genre Where ID='N_0'" 'delete the just renamed Node (and all its Children)
  Debug.Print "Node-Delete (cascaded)", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
 
  New_c.Timing True
    Set Rs = GetDataNodeAndDescendants("N_3")
  Debug.Print "Select Node+Lvl1-Desc.", New_c.Timing, Rs.RecordCount
End Sub
 
Private Sub InsertNode(NodeID As String, Optional ParentID As String)
  GenreInsert.SetText 1, NodeID
  GenreInsert.SetText 2, ParentID
  GenreInsert.SetText 3, "Genre Data for " & NodeID
  GenreInsert.Execute
  
  DataInsert.SetText 1, "DataID_" & NodeID
  DataInsert.SetText 2, NodeID
  DataInsert.SetText 3, "FileName for " & "DataID_" & NodeID
  DataInsert.Execute
End Sub

Private Function GetDataNodeAndDescendants(NodeID As String) As cRecordset
Const SQL$ = "Select * From Data Where Genre_ID=@ID OR Genre_ID In (Select ID From Genre Where Parent_ID=@ID)"
'Const SQL$ = "Select * From Data Where Genre_ID=@ID Union All Select * From Data Where Genre_ID In (Select ID From Genre Where Parent_ID=@ID)"
  With MemDB.CreateSelectCommand(SQL)
    .SetText !ID, NodeID
    Set GetDataNodeAndDescendants = .Execute
  End With
End Function

Olaf

**castaway** · Feb 10th, 2015, 01:40 PM

Originally Posted by ident

An issue is you use LINQ but don't actually use it. Whats with all them else statements? If you really wanted to abuse LINQ you could write it some thing like(example), no need for a select statement.

The query with all the Orelse ??
I dont use this code actually it was just for a quick test.

I needed to check if the problem was coming from the .Contains() method as many forum explain this method is really bad in linQ for upper databases. It's fine for small databases. (Mine is 20.000 at the moment but will go up to 50k or more for some users)

vb Code:

Dim query = (Enumerable.Range(0, 20).Select(Function(n) New With {.Id = n}).ToList).Where(Function(f) {1, 13, 15, 17, 18, 24, 25, 27}.Contains(f.Id)).AsParallel

Ok this is a bit more complicated, would have to test and understant this query
But so far I can tell, it's using a .Contains method that I'm trying to avoid now from my queries.

**castaway** · Feb 10th, 2015, 01:43 PM

For the other codes I'll have to test also ! Thanks for the source code Olaf, I need some time to see how it performs.

For the Union All, maybe I'll try to convert it to linQ, so directly usable into a vb.net code

**Schmidt** · Feb 10th, 2015, 04:15 PM

Originally Posted by castaway

For the other codes I'll have to test also ! Thanks for the source code Olaf, I need some time to see how it performs.

For the Union All, maybe I'll try to convert it to linQ, so directly usable into a vb.net code

As for the (VB6)-Sources I've posted, one should be able to translate the example(s) to
.NET-syntax without larger efforts, using the appropriate .NET-wrapper-classes for this
DB-Engine, which are either available via:
http://system.data.sqlite.org

or alternatively as a more "hazzle-free-install" per NuGet-package via:
https://www.nuget.org/packages/System.Data.SQLite/

An InMemory-DB-Connection, which has Foreign Keys enabled from the get-go can be created this way:

Code:

var Cnn = new SQLiteConnection("Data Source=:memory:;Foreign Keys=True");

CommandObject-Syntax might differ in a few details, but all in all it should be
well-portable, keeping more or less the same code-structure.

An attempt to use System.Data.SQLite.Linq directly might be worthwhile as well:
http://vijayt.com/Post/Using-SQLite-...h-LINQ-to-SQL-

Would be interested to hear, how the .NET-SQLite-wrapper compares (timing-wise)
to the COM-based one I've used above.

Olaf

**castaway** · Feb 11th, 2015, 10:30 AM

Hi

The file database I'm using is not SqLite but *.mdb (see post#8).
When I first started this project, long time ago, I choose *.mdb instead of *.sqlite because I already had *.mdb file and stored procedures knowledge.

Sqlite has done a long way since then, and maybe I'll try also to use it sometimes !

**Schmidt** · Feb 11th, 2015, 11:48 AM

Originally Posted by castaway

The file database I'm using is not SqLite but *.mdb (see post#8).

I'm entirely aware of that... (and wasn't really suggesting a switch from
your *.mdb to an sqlite-db-file ... not at this point at least...).

What I was trying to show was, that your "tree-like-scenario" (with most
of your *.mdb-Tables being replicated already In-Memory by DataTables,
then queried InMemory per LINQ), could be addressed also by an alternative
InMemory-scenario which was based on an SQLite-InMemory-DB...

Then using normal SQL (in SQLite -flavour) to query these alternative
InMemory-(TreeLike)Structures in a better performance...

What you currently have (or do) is (in case I understood correctly):

1) <Perform Raw-Selects on the *.mdb-File, using the appropriate methods provided by a DataSet>
...
2) <These full-table-selects end up as an InMemory-copy (a DataTable) and will be hosted there>
...
3) <Query these DataTables per LINQ>

And as I read it in your last posts, you're in the meantime down to ~250msec (from the former 25secs you started with)

The scenario with SQLite as your "alternative InMemory-DB-Storage" would work this way:

1) <Perform Raw-Selects on the *.mdb-File, transfer them directly into SQLite-InMemory-DB-Tables>
...
2) <These table-transfers perform best when using a DataReader(e.g. OleDbDataReader) in conjunction with an SQLite-CommandObject>
...
3) <Query the SQLite-InMemory-Tables either directly per "SQLite-SQL" - or per System.Data.SQLite.Linq>

- That approach wouldn't need more Memory, compared with .NET DataSets/DataTables
..(I'd guess it's roughly half, because SQLite by default stores Strings as UTF8 internally, not as WChars)

- And it will perform similar to the results I've posted above - needing roughly 0.5msec instead
..of your current 250msec - which is factor 500 - and still food for thought, on what's wrong with
..LINQ (or DataTables) in this scenario.

Olaf

**castaway** · Feb 11th, 2015, 12:32 PM

Oh wow,
There are many alternatives indeed

Ok I will put all of this on the side for now. I will be a bit busy for work for a couple of weeks but I have taken note of all those options.
But I'll try it on my own when time will be here of me.

Thanks a lot for the help, I'll come back to this thread for any interesting news

Cast

Thread: linQ - Need to optimize slow subquery code (.contains)

Thread Tools

Display

linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Re: linQ - Need to optimize slow subquery code (.contains)

Tags for this Thread

Posting Permissions