Results 1 to 6 of 6

Thread: removing duplicates from a recordset

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Nov 1999
    Posts
    88
    i'm looking for a method of removing duplicate records from a recordset. my first attempt went something like the following:

    'establishes connection
    CreateConnection
    'opens the recordset that contains all records, incl all dups
    EstMasterRS

    'query the main recordset for duplicates
    Do Until rsFF.EOF = True And rsFF.BOF = True
    'set the variables holding variables equal to current record
    strHold0 = rsFF.Fields(0)
    strHold1 = rsFF.Fields(1)
    strHold2 = rsFF.Fields(2)

    'query all records that have strHold0 as a control no
    sqlControl1 = "UPDATE tblFF SET delete = 1, reviewed = 1" & _
    " WHERE FF_ControlNo = '" & strHold0 & "' AND" & _
    " FF_Desc = '" & strHold1 & "' AND" & _
    " FF_CCUnit = '" & strHold2 & "'"
    'slow
    rsControl1.Open sqlControl1, conEquip, adOpenForwardOnly, adLockOptimistic

    rsControl1.MoveNext

    'if recordcount = 1 then the record is unique and marked reviewed
    If rsControl1.EOF = False Or rsControl1.BOF = False Then
    'if recordcount > 1 then then the rs is searched for dup info
    'slow
    rsControl1.Fields(4) = vbTrue
    'rsControl1.Update
    rsControl1.MoveNext
    Do Until rsControl1.EOF = True

    'if criteria is true than this dup info. mark for delete
    If strHold1 = rsControl1.Fields(1) And _
    strHold2 = rsControl1.Fields(2) Then
    rsControl1.Fields(3) = vbTrue
    rsControl1.Fields(4) = vbTrue
    'rsControl1.Update
    Else
    'if criteria is not true than this is unique info
    rsControl1.Fields(4) = vbTrue
    'rsControl1.Update
    End If
    rsControl1.MoveNext
    Loop
    End If
    'requery master rs because source has changed
    'slow
    rsFF.Requery
    'update status
    rsControl1.UpdateBatch
    rsControl1.Close
    Loop

    all lines preceded with "slow" are slowing down my program (duh! ). is there a more efficient way of doing this? i have roughly 60,000 records to deal with and the above program took 14 hours to process less than 25% of them. ideally i would like to be able to do all of my updating from one recordset, but i have been unable to figure out how. any suggestions?

  2. #2
    Fanatic Member Psyrus's Avatar
    Join Date
    Jul 2000
    Location
    NJ
    Posts
    602
    One thing that will speed things up a little bit is using With/End With. Although with that many records I don't know if it will make much of a difference.

    example:

    'query the main recordset for duplicates

    'Begin the first With here
    With rsFF

    Do Until .EOF = True And .BOF = True
    'set the variables holding variables equal to current record
    strHold0 = .Fields(0)
    strHold1 = .Fields(1)
    strHold2 = .Fields(2)

    End With
    'End the first with


    And use it beginning here:

    'slow
    'Begin the second With
    With rsControl1
    .Open sqlControl1, conEquip, adOpenForwardOnly, adLockOptimistic

    .MoveNext


    In the middle remove each occurence of rsControl1 at the dot operator


    And end it here:

    'update status
    .UpdateBatch
    .Close

    End With
    'End the second with
    Loop


    Chris

    VB 6.0 Calendar App Video Gamers Group
    Don't forget to rate people if they helped you.

  3. #3
    Guru Clunietp's Avatar
    Join Date
    Oct 1999
    Location
    USA
    Posts
    1,844
    Is the purpose of all of this is to remove duplicate records from your DB? What are you trying to accomplish?

  4. #4

    Thread Starter
    Lively Member
    Join Date
    Nov 1999
    Posts
    88
    i am trying to migrate a non-relational database to a relational database. there is a lot of duplicate information that i am currently trying to remove. my next step is to identify the inconsistencies in the data.

  5. #5
    Guest

    Lightbulb

    Why not get a record set from your database using the SQL DISTINCT command as in:

    Select Distinct (Field(0) + Field(1) + Field(2)) From tblUrOldTable Into tblUrNewTable

    The Distinct will check the value of the concatenation of the three fields for each record and if the value already exists in the table, will leave it out of the new recordset.

    Post a message here and let me know if it works.




  6. #6

    Thread Starter
    Lively Member
    Join Date
    Nov 1999
    Posts
    88
    jlopez,

    last night around midnight, i was reviewing my "SQL for Dummies" book and found the "DISTINCT" keyword and immediatly plugged it in to my sql statement. what a performance increase! the original code took 16 hours to process less than 25% of the records while the new code took less than 90 seconds. i will have to read "SQL for Dummies" more closely. sorry i didn't post that i figured-out the problem, but thanks for your input.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width