Fastest way to compare two datatables?
I have two datatables. Each datatable contains 125 columns and about 30,000 rows. I currently am using the following method to compare them for differences:
01. DT1TableAdapter.Fill(DataSet1.DT1)
02. DT2TableAdapter.Fill(DataSet1.DT2)
03. BindingSource1.MoveFirst
04. BindingSource2.MoveFirst
05. Compare current row in BindingSource1 with current row in BindingSource2
06. If there are differences add the current two rows to a temporary datatable.
07. BindingSource2.MoveNext
08. Goto step 5 until end of BindingSource2 rows.
09. BindingSource1.MoveNext
10. Goto step 5 until end of BindingSource1 rows.
11. Print a report contains the different rows. (The report has the temporary datatable as a datasource.)
12. End
I have a fast PC and this method takes about 17 min to finish! Is that OK? if not, what is the fastest method to achieve the same goal?
Thanks
Re: Fastest way to compare two datatables?
Not knowing what the data is then there's no other way.
However, data tables are not the best objects to perform a comparison on (and, again, not knowing your comparison rules, it's difficult to come up with a faster way). You probably want to convert to fundamental objects - integers, etc. - and perform a comparison that way. Or, rather, don't fill a table adapter, or even a dataset (but it depends where your data comes from).
Re: Fastest way to compare two datatables?
Maybe you should show us your actual code. It could be that you are looping through every column in the row for failure, and not stopping that row early when you find the first failure. Maybe you are testing every row in 1 table against every row in the other table instead of row 1 vs 1 and then 2 vs 2 etc.
Also, why are you wrapping the datatable in a tableadapter, you can just use the datatable in your comparisons. I wrote a compare app to find duplicates and differences between 2 datatables, however my data was only checking a set number of columns to find a unique value. I could use sorting and add a key to better filter the data for speed. This is a data-aware scenario that SJW describes.
Re: Fastest way to compare two datatables?
it could also be possible that your requirement can completely be done on the database without loading all the data into datatables at all. but who knows... give us some more info.
Re: Fastest way to compare two datatables?
I've used the CompareDataTables in this post and it seems fast, but then again I only have small'ish tables (32x20). http://www.vbforums.com/showthread.p...rn-differences
Re: Fastest way to compare two datatables?
If the column names vary a lot you could compare these first without loading any data ... and then compare the data for columns that are only in both - eliminating some of the data you need to load and compare?
... would reduce memory too!
Kris
Re: Fastest way to compare two datatables?
Here's my code:
Code:
Dim myCON As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\myDB.mdb;Jet OLEDB:Database Password=xxxxxxxxxx"
DT1TableAdapter.Connection.ConnectionString = myCON
DT1TableAdapter.Connection.Open()
DT2TableAdapter.Connection.ConnectionString = myCON
DT2TableAdapter.Connection.Open()
DT1TableAdapter.Fill(DataSet1.DT1)
DT2TableAdapter.Fill(DataSet1.DT2)
DT1BindingSource.MoveFirst()
DT2BindingSource.MoveFirst()
For i = 0 To DT1BindingSource.Count – 1
C_ID1 = DT1BindingSource.Current("ID")
C_AC1 = DT1BindingSource.Current("AC")
DT2BindingSource.Filter = "ID=" & "'" & C_ID1 & "'"
C_AC2 = DT2BindingSource.Current("AC")
If C_AC1 <> C_AC2 Then
Dim DRV As DataRow = DirectCast(DT1BindingSource.Item(i), DataRowView).Row
DataSet1.TempDT.Rows.Add(DRV.ItemArray)
End If
DT1BindingSource.MoveNext()
Next
End If
… load the report (TempDT as datasource)
Please note that I use 'AC' column as a comparison factor because if it is different then all other columns are different as this column represents the total of all other columns' calculations. Also, please note that I need to print all the columns in the report and that's why I load them all with the TableAdapter.Fill
Thanks
Re: Fastest way to compare two datatables?
Sounds like you would be better off using a query that filter out those where the AC value matches. That should be a lot faster than a nested loop approach
Re: Fastest way to compare two datatables?
Quote:
Originally Posted by
DataMiser
Sounds like you would be better off using a query that filter out those where the AC value matches. That should be a lot faster than a nested loop approach
Sorry, I forgot to mention that DT1 and DT2 are in-memory datatables. They are a result of complex join queries of three access tables each. So if you suggest using datareaders for the query to create the temp datatable (which is required for the report) then I'm sorry to tell you that it wont work because a datareader cannot be used to fill an in-memory datatables with data it reads from another in-memory datatable! In this comparison I'm tied by too many factors I must take care of otherwise I wont reach my goal which is printing a report with the differences. I mean, if I achieved much better performance but couldn't be able to print the report then it's a fail! Thanks.
Re: Fastest way to compare two datatables?
If you have the tables in memory, the you can filter them using the DataView Class
Re: Fastest way to compare two datatables?
I really do not understand why you are using a BindingSource to iterate through/search the DataTables. It may not matter much, but I would suspect that that is a slower way to access the information than directly accessing the table rows themselves.
Here is an example of what I mean.
Code:
Sub test()
Dim t1 As New DemoTable
Dim t2 As New DemoTable
Dim diffs As DataTable = t1.Clone() ' set table structure
t2.Rows(1)("AC") = 989
t2.Rows(3)("AC") = 900
For Each row As DataRow In t1.Rows
Dim matches As DataRow() = t2.Select(String.Concat("[ID]='", row("ID").ToString, "'"))
If matches.Length > 0 AndAlso _
row.Field(Of Int32)("AC") <> matches(0).Field(Of Int32)("AC") Then
diffs.Rows.Add(matches(0).ItemArray)
End If
Next
End Sub
Friend Class DemoTable
Inherits DataTable
Public Sub New()
With Me.Columns
.Add("ID", GetType(String))
.Add("AC", GetType(Int32))
End With
With Me.Rows
.Add("fred", 23)
.Add("bill", 55)
.Add("joe", 312)
.Add("sue", 1100)
End With
End Sub
End Class