-
Feb 7th, 2015, 12:21 PM
#1
Thread Starter
Lively Member
linQ - Need to optimize slow subquery code (.contains)
Hello,
I'm using this code to get a subquery :
Code:
dim lnQ = (From data In tblData _
Where data.Genre_Id = "1" _
OrElse (From genre In tblStructure _
Where genre.parent_Id = "1" _
Select genre.Id).Contains(data.Genre_Id) _
Select data).ToList
Basically, I have 2 datatables
- The first is a Treeview structure
Code:
Category (.id="1")
|- Genre a (.id="13")
|- Genre b (.id="15")
|- ...
|- Genre n
- The second keeps the data (20.000 rows)
I need to pull "all rows from the category, including all rows from it's subcategories"
My code works, but it has dramatic results (20/30 seconds to get it done), when I need an instant response. I know the .Contains is really bad.
What would be a good fix to totally get rid of the .contains ???
Thanks in advance!
-
Feb 7th, 2015, 03:32 PM
#2
Re: linQ - Need to optimize slow subquery code (.contains)
Don't use LINQ?
LINQ is slow, in general, but the fundamental solution is to reconsider the problem. I would guess that the "1" you are using is just an example, or else there isn't any point in the LINQ, but one thing you might consider is putting the tblStructure into a Dictionary with the parent ID as the key. Finding out whether or not a Dicitonary holds a key is VERY fast compared to what you are doing. Whether or not this will work in your case is harder to say.
My usual boring signature: Nothing
-
Feb 7th, 2015, 08:50 PM
#3
Thread Starter
Lively Member
Re: linQ - Need to optimize slow subquery code (.contains)
Thanks for your input.
Yes the "1" is an exemple indeed. The project is of course much more complex but I just changed it make it easier for here. (But the nodes from the tree structure really holds a unique Id, so that when the user chose one node I already have the ID).
I know that linQ is slow sometimes, but I guess it's more because of the programmer (me). I mean, most of the other queries do the job in 0.0.100, so 100 milliseconds (for the same exact database with 20.000 rows). That's pretty good.
Just this specific query, with .Contains is a disaster.
I'm sure there are some good workaround to optimize it.
Thanks for the dictionnary option, but the problem is not when querying the table with the structure, but rather the main table in the database. I have no choice of doing a query on it (there are about 10 datatables with relations, primary keys etc..).
In other words, the subquery is quite fast, but the main query (with .contains) is the problem. I tryed them both seperatly..
Last edited by castaway; Feb 7th, 2015 at 09:09 PM.
-
Feb 8th, 2015, 02:34 PM
#4
Re: linQ - Need to optimize slow subquery code (.contains)
Originally Posted by castaway
Just this specific query, with .Contains is a disaster.
I'm sure there are some good workaround to optimize it.
Can't advise you about a better LINQ-Statement, but in case it is the
data-record of the Parent-Node you're after, followed by the data-records
of its first-level-descendants, then (in SQL) a simple UNION ALL would suffice...
(there should be an equivalent for a UNION in LINQ-syntax).
Also not sure, if you already have an index in place on your ParentID-Field(s) -
since that usually speeds up things as well.
Just played that through here with an SQLite-based InMemory-DB, to give you
an impression about the timings, such things usually (should) take.
The following is VB6-code - but from the Create Table Stament and the used SQL
you should be able to deduce what I was doing on this simple Adjacency-List-example...
(which is two levels deep: 200ParentNodes with 200ChildNodes each = 40200 records total) :
Code:
Option Explicit
Private Const NperLvl& = 200 '<- adjust the Nodes per Level here
Private i&, j&, MemDB As New cMemDB, NodeInsert As cCommand, Rs As cRecordset
Private Sub Form_Load()
MemDB.Exec "PRAGMA Foreign_Keys=ON" 'enable Foreign-Keys on this InMemory-DB
MemDB.Exec "Create Table Nodes( NID Text Primary Key," & _
" PID Text References Nodes On Update Cascade On Delete Cascade" & _
") Without RowID"
'Insert example-data into the Adjacency-List we created above
Set NodeInsert = MemDB.CreateCommand("Insert Into Nodes Values(?,?)")
New_c.Timing True
For i = 1 To NperLvl: InsertNode "N_" & i 'ParentNodes
For j = 1 To NperLvl: InsertNode "NC_" & i & "_" & j, "N_" & i 'ChildNodes
Next j, i
Debug.Print "Inserts on the Adj-Lst", New_c.Timing, MemDB.GetCount("Nodes")
'set an additional Index on the ParentID, before we proceed
MemDB.Exec "Create Index Idx_Nodes_PID On Nodes(PID)"
New_c.Timing True 'NID-Updates will affect the ChildNodes PID too (due to Update Cascade in our Table-Def)
MemDB.Exec "Update Nodes Set NID='N_0' Where NID='N_1'"
Debug.Print "Node-Rename (cascaded)", New_c.Timing, MemDB.GetCount("Nodes")
New_c.Timing True 'NID-Deletes will delete the ChildNodes as well (due to Delete Cascade)
MemDB.Exec "Delete From Nodes Where NID='N_0'" 'delete the just renamed Node (and all its Children)
Debug.Print "Node-Delete (cascaded)", New_c.Timing, MemDB.GetCount("Nodes")
New_c.Timing True
Set Rs = GetNodeAndDescendants("N_3")
Debug.Print "Select Node+Lvl1-Desc.", New_c.Timing, Rs.RecordCount
End Sub
Private Sub InsertNode(NodeID As String, Optional ParentID As String)
NodeInsert.SetText 1, NodeID
NodeInsert.SetText 2, ParentID
NodeInsert.Execute
End Sub
Private Function GetNodeAndDescendants(NodeID As String) As cRecordset
With MemDB.CreateSelectCommand("SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID")
.SetText !ID, NodeID
Set GetNodeAndDescendants = .Execute
End With
End Function
The two SQL-Strings which are (IMO) performance-relevant also in your case, I've colored magenta.
Here's the timings, the above code reports:
Code:
Inserts on the Adj-Lst 288.70msec 40200
Node-Rename (cascaded) 1.05msec 40200
Node-Delete (cascaded) 1.79msec 39999
Select Node+Lvl1-Desc. 0.86msec 201
Same thing again, this time with a commented out Extra-Index-Creation on the ParentID
Code:
Inserts on the Adj-Lst 289.92msec 40200
Node-Rename (cascaded) 12.58msec 40200
Node-Delete (cascaded) 1,602.38msec 39999
Select Node+Lvl1-Desc. 4.56msec 201
Leaving out the Index will (significantly) slow down especially the cascaded Node-Delete,
though the other timings (albeit factor 5-12 slower) are still bearable.
Olaf
Last edited by Schmidt; Feb 8th, 2015 at 07:21 PM.
-
Feb 8th, 2015, 03:44 PM
#5
Re: linQ - Need to optimize slow subquery code (.contains)
There are workarounds, and Schmidt has shown a good one, though it may not be practical, either. If you can do the job in SQL when populating the tables, then that will generally be faster than any manipulation afterwards. What I was suggesting is that you have found a situation where the way you have the data organized is slowing you down, so store it a different way that works for that particular problem. There is no reason why data has to be stored only one way. Memory is cheap. I often have dictionaries for looking up one thing based on another, and sometimes I build reverse dictionaries with the same data to speed up a particular sub-case.
In your case, I would guess that the LINQ being generated builds horrible intermediate tables MANY MANY times over as it iterates through the collections. Exactly how it will handle that scenario I can't say, but based on the slowdown you are seeing, I would say that you have a loop within a loop. Either loop on its own wouldn't be so bad, but combined you multiply the time by the number of iterations in some way that causes the time to explode in a more than exponential fashion. So, there may not be a simple, "do this, and all will be well" kind of solution. It may be that you need to have a different organization of the data that deals with this particular case, whether it be a different SQL query or something else.
My usual boring signature: Nothing
-
Feb 8th, 2015, 04:35 PM
#6
Re: linQ - Need to optimize slow subquery code (.contains)
Originally Posted by Shaggy Hiker
If you can do the job in SQL when populating the tables, then that will generally be faster than any manipulation afterwards.
Sometimes direct SQL-queries will be slower than "SQL-like" querying from "InMemory-DB-like" DataStructures...
Server-Roundtrips can be expensive - and well-balanced clientside processing is worthwhile to reduce ServerLoad -
as demonstrated in modern WebApps with jQuery+Ajax and Browser-internally handled sorting, filtering etc...
of Data which was selected and transferred from the serverside in less expensive to provide "Raw-ResultSets".
Originally Posted by Shaggy Hiker
What I was suggesting is that you have found a situation where the way you have the data organized is slowing you down, so store it a different way that works for that particular problem. There is no reason why data has to be stored only one way. Memory is cheap.
Cheap memory (along with convenience) is the reason behind .NET-DataTables and SQL-like LINQ-expressions
which in combination form the same thing as I was using above - an InMemory-DB-scenario, able to buffer
"Raw-Selects" in a way, which one can later on conveniently perform "Selects, Filtering, Ordering, Joining, Grouping" on
(in a decent performance).
Originally Posted by Shaggy Hiker
I often have dictionaries for looking up one thing based on another, and sometimes I build reverse dictionaries with the same data to speed up a particular sub-case.
And exactly that should *not* be necessary when the alternative "InMemory-Convenience-Layer" (DataTables + LINQ)
is well-implemented and supports Indexing (because indexed lookups are, what you perform on a Dictionary as well).
Originally Posted by Shaggy Hiker
In your case, I would guess that the LINQ being generated builds horrible intermediate tables MANY MANY times over as it iterates through the collections.
Well, that's the case with inefficiently formulated "plain SQL" (in conjunction with missing or misplaced indexes) too.
DataTables+LINQ should be able to perform basically at the same Level as my small SQLite-InMemory example...
If that is not possible, then I'd not see the point of LINQ-integration (since the amount of Data the OP was
mentioning (20000 records or so), is still in the lower range, perfectly suited for such InMemory-massaging.
Perhaps somebody with more LINQ-experience will point out a (much) better performing expression -
I was just trying to give some hints (especially with regards to indexing also the ParentKey-Column),
where the problem might be located - and what the timings should come out like, after the problem
was solved satisfyingly.
Olaf
Last edited by Schmidt; Feb 8th, 2015 at 04:48 PM.
-
Feb 8th, 2015, 04:38 PM
#7
Re: linQ - Need to optimize slow subquery code (.contains)
I have never seen a LINQ expression written in such a way. I am no expert with data tables so if you could give me a sample i could try to re-write the expression. PLINQ would also likely be more beneficial here. I would wager a compiled regex pattern would be quicker then contains over large querys but do not quote me on that.
-
Feb 8th, 2015, 07:27 PM
#8
Thread Starter
Lively Member
Re: linQ - Need to optimize slow subquery code (.contains)
First, thanks for your inputs. I appreciate it !
Now I will give just a little more info for the program, before I answer. I think I actually found a good workaround.
Info:
1/ The database
I use a *.mdb file to keep the database. It's perfect for my needs, extrely fast also.
When the program loads, it will load a dataset.
The data scheme (tables, relations, primary keys etc..) has been studied to be as efficient as possible. It's a real database (and not a single table with everything in it as we can see sometimes)
The dataset is actually kind of a copy of the real database file. Works great.
2/ The Treeview
In my program, I have a treeview in the left side. On load, this treeview is loaded with its structure.
To explain better : the first nodes are done by the program itself, but then the user can arrange it the way he wants. He can also drag and drop, making "sub categories".
In the total, there are 4 kinds of nodes with different properties. For each node, I needed to keep a lot of different data, so I decided to store the treeview structure into a datatable. A typical database has something like 3000 nodes so a datatable was the good choice I think, to save the structure.
The action
Ok this is really straightforward : The user click on a treenode to see the files (from the chosen category) in a listview on the right side.
So when a click is done :
-a- I query the dataset, with an argument being the "ID" of the treenode.
-b- The query gives me a list of files.
-c- I populate the listview. The listview has about 8 different column, but I get everything from my query.
This usually takes about 100 milliseconds.
The engeenering
I think the way I built this is rather "kiss" (keep it small and simple)
The treenode has nothing to do with the problem I have for the linQ query. Sorry if I wasnt clear at the beginning.
The treenode is just the way the user choses its ID to show in the listview. Please don't worry about the treeview, I can get the ID's I need in two lines of code and I'm fine with it.
The recursiveness
So far, so good.
I then added an option : the recursive selection. Let's say a user want's to see all files from a category, AND all files from its children.
I have many choices :
- Query the dataset for every treenode one by one, starting with the chose node, then going through the childrens (usually there are max 15 sub categories). This is a good option and I can populate the listview each time
- Do the same and keep the results in some temp variable, then populate the listview
- Do a better query that will look for everything I need at the same time, and get everything in only one time
As you can tell, at the moment I chose the third option.
The problem
As some of you have mentionned, there are loops.
But my code is rather straighforward :
Code:
dim lnQ = (From data In tblData _
Where data.Genre_Id = "1" _
OrElse (From genre In tblStructure _
Where genre.parent_Id = "1" _
Select genre.Id).Contains(data.Genre_Id) _
Select data).ToList
First there's a subquery : It will get a simple list of the children ID's. This is extremly fast.
Second, the query for the files to populate : it check if the row has the corresponding Genre_ID, if not, another chance, it checks whether its Genre_Id is one of the childrens ID.
Where are the loops ? Well there are being created when the compiler changes the linQ into real SQL.
.Contains() == Where In
I've read that this cause the compiler to do something like "where in (a) or (b) or (c) or (n...)"
So, I guess for every row in the datatable that meet the first where clause, they are then compared to every options in the .Contains() extention, one by one. I dont know how and why, but there are loops at this moment.
This is what gives the 20 seconds for this particular query.
Some solutions
First I though I would change a little this query to put it this way :
Code:
dim subquery = (From genre In tblStructure _
Where genre.Id = "1" _
OrElse genre.parent_Id = "1" _
Select genre.Id)
dim lnQ = (From data In tblData _
Where subquery.Contains(data.Genre_Id) _
Select data)
It works, results are correct.
I just splitted the main query from the subquery.
I added the main treenode ID directly with it's children ID's (so that it only compare once instead of 2x in the previous version)
Impact on the query : NONE !
Same time.
This is where I understood the subquery is just fine. Debugging it shows me that it holds on the second query (where contains....)
What's next ?
I then though about going back to old time SELECT() method from the datatables.
I'm sure I can do it.
But I needed to try something before :
Code:
dim lnQ = (From data In tblData _
Where genre.Id = "1" _
OrElse genre.Id = "13" _
OrElse genre.Id = "15" _
OrElse genre.Id = "17" _
OrElse genre.Id = "18" _
OrElse genre.Id = "24" _
OrElse genre.Id = "25" _
OrElse genre.Id = "27" _
Select genre.Id)
Bam. Works perfectly, with less than a second.
So, either I go with something like a tblData.Select("...") and I build the corresponding query here for the dataset
or... I keep searching.
Final stage : Regex !
LinQ is just too good not to use it.
Ok, the subqueries are really bad. The only solution I found on forums and internet was this .Contains() method, which actually seems logical, if only it wasnt so slow.
Then I thought about regex.
I love regex.
This is what I came with :
Code:
lnQ = (From data In tblData.AsEnumerable() _
Where Regex.IsMatch(data.Genre_Id.ToString, strIds) _
Select data)
Of course, you understood that prior to that, I have built a "strIds" variable with a valid regex inside.
I simply use the same subquery as shows above (to get a list of ID's) and "flatter" them into a regex like this :
Code:
strIds = "^(1|13|15|17|18|24|25|27)$"
The solution
This new query is done in about 0.0.250
This is quite what I was looking for. From 20 seconds to 250 milliseconds : good.
The regex does exactly he same as the .Contains()
It just checks wheter the datarow.Genre_ID correspond to one of the possibles ID's
But, it does it in a much better way with no loops.
I love regex.
Ans now, I know I can use regex in linQ !
Conclusion
There are some other possibilities I didnt really investigate.
Sorry if I didnt really look for the UNION ALL option. Also I've read that one of the JOIN method could be a solution.
Right now, I just needed to check the datatable if the rows.Genre_Id would correspond to one of the possible ID's, and the regex thing is just great.
If anyone have another good solution, I always open to other possibilities.
Sorry for the long answer, but i needed to explain. Hopefully this can help someone in the futur.
And thanks for helping you guys made me think about it quite hard
-
Feb 8th, 2015, 07:30 PM
#9
Re: linQ - Need to optimize slow subquery code (.contains)
For a two-tier hierarchy, surely you don't need the UNION at all?
i.e.
"SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID"
is the same as
"SELECT * FROM Nodes Where NID=@ID OR PID=@ID"
If you don't know where you're going, any road will take you there...
My VB6 love-children: Vee-Hive and Vee-Launcher
-
Feb 8th, 2015, 08:11 PM
#10
Re: linQ - Need to optimize slow subquery code (.contains)
An issue is you use LINQ but don't actually use it. Whats with all them else statements? If you really wanted to abuse LINQ you could write it some thing like(example), no need for a select statement.
vb Code:
Dim query = (Enumerable.Range(0, 20).Select(Function(n) New With {.Id = n}).ToList).Where(Function(f) {1, 13, 15, 17, 18, 24, 25, 27}.Contains(f.Id)).AsParallel
-
Feb 8th, 2015, 08:33 PM
#11
Re: linQ - Need to optimize slow subquery code (.contains)
Originally Posted by ColinE66
For a two-tier hierarchy, surely you don't need the UNION at all?
i.e.
"SELECT * FROM Nodes Where NID=@ID Union All Select * From Nodes Where PID=@ID"
is the same as
"SELECT * FROM Nodes Where NID=@ID OR PID=@ID"
Yep, that's right - at least for the simplified example (with only one table) ...
I guess with the Union approach I had the OPs problem in mind, which spanned 2 tables
(a Union making it always easier for a Query-Optimizer, to choose the right Index for the partial Select -
the LINQ-optimizer perhaps being no exception in this regard)
An adaption on the OPs side (still expressed in SQL not in LINQ) could have looked like:
Select * From data Where Genre_id=@ID
Union All
Select * From data Where Genre_id In (Select id From genre Where Parent_id=@ID)
Of course also the above could be expressed without a Union, using OR -
but e.g. SQLite (at least in earlier versions) could only apply 1 properly
choosen index per Table-Select - and the common workaround was
therefore always, to use a Union instead of ORs.
I assume, that the current SQLite-versions will have no problems with a directly
OR combined result which involves more than one table (and table-index) -
a concrete test with two tables, similar to the scenario of the OP, might be helpful
to shed some more light on the current behaviour of SQLite in that regard...
Olaf
-
Feb 8th, 2015, 09:02 PM
#12
Re: linQ - Need to optimize slow subquery code (.contains)
My instinct tells me that, if NID and PID are both indexed, the example without the UNION would be quicker. Not tested, though...and off to bed. Have no idea how Sqlite might have optimized it in earlier versions: My instinct is based on past experience with Oracle and Sybase...
If you don't know where you're going, any road will take you there...
My VB6 love-children: Vee-Hive and Vee-Launcher
-
Feb 9th, 2015, 09:47 AM
#13
Re: linQ - Need to optimize slow subquery code (.contains)
Originally Posted by ColinE66
My instinct tells me that, if NID and PID are both indexed, the example without the UNION would be quicker. Not tested, though...and off to bed. Have no idea how Sqlite might have optimized it in earlier versions: My instinct is based on past experience with Oracle and Sybase...
Just for completeness (and for those who might be interested in experimenting with the .NET-lib
from sqlite.org (system.data.sqlite - which should offer comparable InMemory behaviour)...
I've adapted the example more to the OPs problem now, but performance is roughly the same as before:
Code:
Inserts on Data+Genre 344.78msec 40200 40200
Node-Rename (cascaded) 1.21msec 40200 40200
Node-Delete (cascaded) 3.42msec 39999 39999
Select Node+Lvl1-Desc. 1.15msec 201
The time for the initial Inserts is now higher, since I had to add 40200 records into each
of the two tables (Data and Genre) - both now coming with a higher ColumnCount as well.
The rest of the timings is basically as in the Single-Table example (Node-Deletes now
taking a bit longer, since deleting a Node now cascades across two tables with twice
the amount of records to delete).
The Select across two tables shows no real difference in performance here with SQLite, no matter
if I write it as a Union or with a simple OR... (timing's pretty fast, varying between 0.8msec and 1.5msec)
Below is the updated VB6-Code:
Code:
Private Const NperLvl& = 200 '<- adjust the Nodes per Level here
Private i&, j&, MemDB As New cMemDB, GenreInsert As cCommand, DataInsert As cCommand, Rs As cRecordset
Private Sub Form_Load()
MemDB.Exec "PRAGMA Foreign_Keys=ON" 'enable Foreign-Keys on this InMemory-DB
MemDB.Exec "Create Table Data( ID Text Primary Key," & _
" Genre_ID Text References Genre On Update Cascade On Delete Cascade," & _
" FileName Text" & _
") Without RowiD"
MemDB.Exec "Create Table Genre( ID Text Primary Key," & _
" Parent_ID Text References Genre On Update Cascade On Delete Cascade," & _
" GenreData Text" & _
") Without RowiD"
'create Insert-Commands for the two tables
Set DataInsert = MemDB.CreateCommand(" Insert Into Data Values(?,?,?)")
Set GenreInsert = MemDB.CreateCommand("Insert Into Genre Values(?,?,?)")
New_c.Timing True
MemDB.BeginTrans 'ensure a bit more speed for the bulk-insert, by transaction-wrapping
For i = 1 To NperLvl: InsertNode "N_" & i 'ParentNodes
For j = 1 To NperLvl: InsertNode "NC_" & i & "_" & j, "N_" & i 'ChildNodes
Next j, i
MemDB.CommitTrans
Debug.Print "Inserts on Data+Genre", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
'set additional Indexes on Genre.Parent_ID and Data.Genre_ID before we proceed
MemDB.Exec "Create Index Idx_Genre_Parent_ID On Genre(Parent_ID)"
MemDB.Exec "Create Index Idx_Data_Genre_ID On Data (Genre_ID)"
New_c.Timing True 'Genre.ID-Updates will affect the ChildNodes too (due to Update Cascade, and in both Tables)
MemDB.Exec "Update Genre Set ID='N_0' Where ID='N_1'"
Debug.Print "Node-Rename (cascaded)", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
New_c.Timing True 'Genre.ID-Deletes will delete the ChildNodes as well (due to Delete Cascade, and in both Tables)
MemDB.Exec "Delete From Genre Where ID='N_0'" 'delete the just renamed Node (and all its Children)
Debug.Print "Node-Delete (cascaded)", New_c.Timing, MemDB.GetCount("Genre"), MemDB.GetCount("Data")
New_c.Timing True
Set Rs = GetDataNodeAndDescendants("N_3")
Debug.Print "Select Node+Lvl1-Desc.", New_c.Timing, Rs.RecordCount
End Sub
Private Sub InsertNode(NodeID As String, Optional ParentID As String)
GenreInsert.SetText 1, NodeID
GenreInsert.SetText 2, ParentID
GenreInsert.SetText 3, "Genre Data for " & NodeID
GenreInsert.Execute
DataInsert.SetText 1, "DataID_" & NodeID
DataInsert.SetText 2, NodeID
DataInsert.SetText 3, "FileName for " & "DataID_" & NodeID
DataInsert.Execute
End Sub
Private Function GetDataNodeAndDescendants(NodeID As String) As cRecordset
Const SQL$ = "Select * From Data Where Genre_ID=@ID OR Genre_ID In (Select ID From Genre Where Parent_ID=@ID)"
'Const SQL$ = "Select * From Data Where Genre_ID=@ID Union All Select * From Data Where Genre_ID In (Select ID From Genre Where Parent_ID=@ID)"
With MemDB.CreateSelectCommand(SQL)
.SetText !ID, NodeID
Set GetDataNodeAndDescendants = .Execute
End With
End Function
Olaf
-
Feb 10th, 2015, 01:40 PM
#14
Thread Starter
Lively Member
Re: linQ - Need to optimize slow subquery code (.contains)
Originally Posted by ident
An issue is you use LINQ but don't actually use it. Whats with all them else statements? If you really wanted to abuse LINQ you could write it some thing like(example), no need for a select statement.
The query with all the Orelse ??
I dont use this code actually it was just for a quick test.
I needed to check if the problem was coming from the .Contains() method as many forum explain this method is really bad in linQ for upper databases. It's fine for small databases. (Mine is 20.000 at the moment but will go up to 50k or more for some users)
vb Code:
Dim query = (Enumerable.Range(0, 20).Select(Function(n) New With {.Id = n}).ToList).Where(Function(f) {1, 13, 15, 17, 18, 24, 25, 27}.Contains(f.Id)).AsParallel
Ok this is a bit more complicated, would have to test and understant this query
But so far I can tell, it's using a .Contains method that I'm trying to avoid now from my queries.
-
Feb 10th, 2015, 01:43 PM
#15
Thread Starter
Lively Member
Re: linQ - Need to optimize slow subquery code (.contains)
For the other codes I'll have to test also ! Thanks for the source code Olaf, I need some time to see how it performs.
For the Union All, maybe I'll try to convert it to linQ, so directly usable into a vb.net code
-
Feb 10th, 2015, 04:15 PM
#16
Re: linQ - Need to optimize slow subquery code (.contains)
Originally Posted by castaway
For the other codes I'll have to test also ! Thanks for the source code Olaf, I need some time to see how it performs.
For the Union All, maybe I'll try to convert it to linQ, so directly usable into a vb.net code
As for the (VB6)-Sources I've posted, one should be able to translate the example(s) to
.NET-syntax without larger efforts, using the appropriate .NET-wrapper-classes for this
DB-Engine, which are either available via:
http://system.data.sqlite.org
or alternatively as a more "hazzle-free-install" per NuGet-package via:
https://www.nuget.org/packages/System.Data.SQLite/
An InMemory-DB-Connection, which has Foreign Keys enabled from the get-go can be created this way:
Code:
var Cnn = new SQLiteConnection("Data Source=:memory:;Foreign Keys=True");
CommandObject-Syntax might differ in a few details, but all in all it should be
well-portable, keeping more or less the same code-structure.
An attempt to use System.Data.SQLite.Linq directly might be worthwhile as well:
http://vijayt.com/Post/Using-SQLite-...h-LINQ-to-SQL-
Would be interested to hear, how the .NET-SQLite-wrapper compares (timing-wise)
to the COM-based one I've used above.
Olaf
-
Feb 11th, 2015, 10:30 AM
#17
Thread Starter
Lively Member
Re: linQ - Need to optimize slow subquery code (.contains)
Hi
The file database I'm using is not SqLite but *.mdb (see post#8).
When I first started this project, long time ago, I choose *.mdb instead of *.sqlite because I already had *.mdb file and stored procedures knowledge.
Sqlite has done a long way since then, and maybe I'll try also to use it sometimes !
-
Feb 11th, 2015, 11:48 AM
#18
Re: linQ - Need to optimize slow subquery code (.contains)
Originally Posted by castaway
The file database I'm using is not SqLite but *.mdb (see post#8).
I'm entirely aware of that... (and wasn't really suggesting a switch from
your *.mdb to an sqlite-db-file ... not at this point at least...).
What I was trying to show was, that your "tree-like-scenario" (with most
of your *.mdb-Tables being replicated already In-Memory by DataTables,
then queried InMemory per LINQ), could be addressed also by an alternative
InMemory-scenario which was based on an SQLite-InMemory-DB...
Then using normal SQL (in SQLite -flavour) to query these alternative
InMemory-(TreeLike)Structures in a better performance...
What you currently have (or do) is (in case I understood correctly):
1) <Perform Raw-Selects on the *.mdb-File, using the appropriate methods provided by a DataSet>
...
2) <These full-table-selects end up as an InMemory-copy (a DataTable) and will be hosted there>
...
3) <Query these DataTables per LINQ>
And as I read it in your last posts, you're in the meantime down to ~250msec (from the former 25secs you started with)
The scenario with SQLite as your "alternative InMemory-DB-Storage" would work this way:
1) <Perform Raw-Selects on the *.mdb-File, transfer them directly into SQLite-InMemory-DB-Tables>
...
2) <These table-transfers perform best when using a DataReader(e.g. OleDbDataReader) in conjunction with an SQLite-CommandObject>
...
3) <Query the SQLite-InMemory-Tables either directly per "SQLite-SQL" - or per System.Data.SQLite.Linq>
- That approach wouldn't need more Memory, compared with .NET DataSets/DataTables
..(I'd guess it's roughly half, because SQLite by default stores Strings as UTF8 internally, not as WChars)
- And it will perform similar to the results I've posted above - needing roughly 0.5msec instead
..of your current 250msec - which is factor 500 - and still food for thought, on what's wrong with
..LINQ (or DataTables) in this scenario.
Olaf
-
Feb 11th, 2015, 12:32 PM
#19
Thread Starter
Lively Member
Re: linQ - Need to optimize slow subquery code (.contains)
Oh wow,
There are many alternatives indeed
Ok I will put all of this on the side for now. I will be a bit busy for work for a couple of weeks but I have taken note of all those options.
But I'll try it on my own when time will be here of me.
Thanks a lot for the help, I'll come back to this thread for any interesting news
Cast
Tags for this Thread
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|