PDA

Click to See Complete Forum and Search --> : removing duplicates from a recordset


dwhawley
Jul 30th, 2000, 10:13 AM
i'm looking for a method of removing duplicate records from a recordset. my first attempt went something like the following:

'establishes connection
CreateConnection
'opens the recordset that contains all records, incl all dups
EstMasterRS

'query the main recordset for duplicates
Do Until rsFF.EOF = True And rsFF.BOF = True
'set the variables holding variables equal to current record
strHold0 = rsFF.Fields(0)
strHold1 = rsFF.Fields(1)
strHold2 = rsFF.Fields(2)

'query all records that have strHold0 as a control no
sqlControl1 = "UPDATE tblFF SET delete = 1, reviewed = 1" & _
" WHERE FF_ControlNo = '" & strHold0 & "' AND" & _
" FF_Desc = '" & strHold1 & "' AND" & _
" FF_CCUnit = '" & strHold2 & "'"
'slow
rsControl1.Open sqlControl1, conEquip, adOpenForwardOnly, adLockOptimistic

rsControl1.MoveNext

'if recordcount = 1 then the record is unique and marked reviewed
If rsControl1.EOF = False Or rsControl1.BOF = False Then
'if recordcount > 1 then then the rs is searched for dup info
'slow
rsControl1.Fields(4) = vbTrue
'rsControl1.Update
rsControl1.MoveNext
Do Until rsControl1.EOF = True

'if criteria is true than this dup info. mark for delete
If strHold1 = rsControl1.Fields(1) And _
strHold2 = rsControl1.Fields(2) Then
rsControl1.Fields(3) = vbTrue
rsControl1.Fields(4) = vbTrue
'rsControl1.Update
Else
'if criteria is not true than this is unique info
rsControl1.Fields(4) = vbTrue
'rsControl1.Update
End If
rsControl1.MoveNext
Loop
End If
'requery master rs because source has changed
'slow
rsFF.Requery
'update status
rsControl1.UpdateBatch
rsControl1.Close
Loop

all lines preceded with "slow" are slowing down my program (duh! :) ). is there a more efficient way of doing this? i have roughly 60,000 records to deal with and the above program took 14 hours to process less than 25% of them. ideally i would like to be able to do all of my updating from one recordset, but i have been unable to figure out how. any suggestions?

Psyrus
Jul 30th, 2000, 11:23 AM
One thing that will speed things up a little bit is using With/End With. Although with that many records I don't know if it will make much of a difference.

example:

'query the main recordset for duplicates

'Begin the first With here
With rsFF

Do Until .EOF = True And .BOF = True
'set the variables holding variables equal to current record
strHold0 = .Fields(0)
strHold1 = .Fields(1)
strHold2 = .Fields(2)

End With
'End the first with


And use it beginning here:

'slow
'Begin the second With
With rsControl1
.Open sqlControl1, conEquip, adOpenForwardOnly, adLockOptimistic

.MoveNext


In the middle remove each occurence of rsControl1 at the dot operator


And end it here:

'update status
.UpdateBatch
.Close

End With
'End the second with
Loop

Clunietp
Jul 30th, 2000, 11:38 AM
Is the purpose of all of this is to remove duplicate records from your DB? What are you trying to accomplish?

dwhawley
Jul 31st, 2000, 08:01 AM
i am trying to migrate a non-relational database to a relational database. there is a lot of duplicate information that i am currently trying to remove. my next step is to identify the inconsistencies in the data.

Aug 1st, 2000, 12:52 PM
Why not get a record set from your database using the SQL DISTINCT command as in:

Select Distinct (Field(0) + Field(1) + Field(2)) From tblUrOldTable Into tblUrNewTable

The Distinct will check the value of the concatenation of the three fields for each record and if the value already exists in the table, will leave it out of the new recordset.

Post a message here and let me know if it works.

dwhawley
Aug 1st, 2000, 03:37 PM
jlopez,

last night around midnight, i was reviewing my "SQL for Dummies" book and found the "DISTINCT" keyword and immediatly plugged it in to my sql statement. what a performance increase! the original code took 16 hours to process less than 25% of the records while the new code took less than 90 seconds. i will have to read "SQL for Dummies" more closely. sorry i didn't post that i figured-out the problem, but thanks for your input.