-
Jan 15th, 2011, 07:09 AM
#1
[RESOLVED] VB6 / Access - Removing Duplicate Records
Ok, I'm no Database Expert.... I assume I'm in the correct forum.....
I have a Table with about 41,000 records (and growing). The Table has about 40 Columns. It is automatically updated (by a third party) via SMS messages. Occasionally Duplicate Messages are received, where a 'Duplicate Message' is defined by one where all columns except 4 are identical to another record.
My task is to remove the duplicate records and somehow inform the caller how many duplicates were deleted.
The approach I have taken is to create a copy of the original recordset, sort the copy into a suitable order (by Columns "PreviousMessageDate" and "SerialNo"), and then check each record aginst the next record and delete it if it matches. Although slow, this seems to work. The problem I have is in trying to report the number of duplicates deleted per 'SerialNo'. I have tried Appending a new Field ("Duplicates") in the Recordset, but in order to do that I have to Close it, and having Closed it I can't seem to re-Open it and access the original records.
Here's the code
Code:
Option Explicit
Private db As ADODB.Connection
Private rs As ADODB.Recordset
Private Sub Form_Load()
Dim strSQL As String
Dim rs1 As ADODB.Recordset
Set db = New ADODB.Connection
db.CursorLocation = adUseClient
db.Open "PROVIDER=Microsoft.Jet.OLEDB.4.0;Data Source=\\HOME-VISTA\Public\PAYG.mdb;"
Set rs = New ADODB.Recordset
strSQL = "SELECT * FROM RemoteAuditT"
rs.Open strSQL, db, adOpenStatic, adLockOptimistic
If rs.RecordCount <> 0 Then
Set rs1 = New ADODB.Recordset
Set rs1 = RemoveDups(rs)
End If
'
' Rest of the Application Code goes here....
'
End Sub
Private Function RemoveDups(rs As ADODB.Recordset) As ADODB.Recordset
Dim rsTemp As ADODB.Recordset
Dim intI As Integer
Dim intRec As Integer
Dim intDup As Integer
Dim strCurField As String
Dim strNewField As String
Set rsTemp = New ADODB.Recordset
Set rsTemp = CreateCopy(rs)
rsTemp.Open
rsTemp.MoveFirst ' This is where I get an
' either BOF or EOF is true..etc Message
rsTemp.Sort = "PreviousMessageDate,SerialNo"
strCurField = CopyFields(rsTemp)
rsTemp.MoveNext
Do
strNewField = CopyFields(rsTemp)
If strNewField = strCurField Then
rsTemp.Delete adAffectCurrent
intDup = intDup + 1
rsTemp.MovePrevious
rsTemp.Fields("Duplicates").Value = intDup
Else
strCurField = strNewField
intDup = 0
End If
rsTemp.MoveNext
Loop Until rsTemp.EOF
End Function
Private Function CopyFields(rs As ADODB.Recordset) As String
'
' Create a String holding the values for the required fields
' of the current record
'
Dim intI As Integer
Dim strFields As String
For intI = 0 To rs.Fields.Count - 1
If rs.Fields(intI).Name <> "RMA" And _
rs.Fields(intI).Name <> "ServerTimeAuditReceived" And _
rs.Fields(intI).Name <> "TXSignalStrength" And _
rs.Fields(intI).Name <> "RXSignalStrength" Then
strFields = strFields & rs.Fields(intI).Value
End If
Next intI
CopyFields = strFields
End Function
Private Function CreateCopy(rs As ADODB.Recordset) As ADODB.Recordset
'
' Create a Copy of the RecordSet
' and add a new Field to the copy
'
Dim pb As New PropertyBag
Dim rsTemp As ADODB.Recordset
pb.WriteProperty "Copy", rs
Set rsTemp = pb.ReadProperty("Copy")
rsTemp.Close
rsTemp.Fields.Append "Duplicates", adInteger, 2, adFldMayBeNull
Set CreateCopy = rsTemp
Set rsTemp = Nothing
End Function
I originally used the .Clone Method to create the Copy,but ran into the same issues regarding reporting the Duplicates. The 'Clone' method had the advantage of actually deleting the records from the Table/original Recordset.
The actual data in the Table is not updated or modified by the target application.
I suppose I could just return a Comma separated string: SerialNo1,Dups,SerialNo2,Dups etc and let the caller Split and process it, but I'm rather hoping there's a technique / ADODB 'feature' that is somewhat 'tidier'.
Any suggestions as to where I'm going wrong or a better method of doing what's required would be gratefully received.
Last edited by Doogle; Jan 15th, 2011 at 07:37 AM.
Reason: Added some more Info
-
Jan 15th, 2011, 02:48 PM
#2
Re: VB6 / Access - Removing Duplicate Records
There are several ways to manage dupplicate records...
you may:
- loop through all records and try to find a match and based on your findings insert or refuse
- check for dups with simple sql using Having count(some key field) > 1 and insert distinct record into some "stage" table (this can be done with one statement)
- then, you can delete all dups from main table and finally insert all from the satging (you will of course need to purge stage table at the end)
- etc, etc, etc...
However, why not definingg unque constraint on you table to disallow dups in the first plae?
-
Jan 15th, 2011, 02:53 PM
#3
Re: VB6 / Access - Removing Duplicate Records
Btw, if you can define unique costraint then each rejected record can be recorded into some dups table.
Once it's setup you will be able to easily "inform the caller how many duplicates were rejected" (name of your caller can be stored in the dups table).
-
Jan 15th, 2011, 02:56 PM
#4
Re: VB6 / Access - Removing Duplicate Records
Originally Posted by RhinoBull
However, why not definingg unque constraint on you table to disallow dups in the first plae?
Aye, there's the rub. As I said, the updating is via a third party and until they can get their act together, muggins (ie. me)has to cope with it !!
-
Jan 15th, 2011, 03:41 PM
#5
Re: VB6 / Access - Removing Duplicate Records
But if you have access to table design why can't you define the constraint yourself?
-
Jan 16th, 2011, 03:14 AM
#6
Re: VB6 / Access - Removing Duplicate Records
I forgot to say thanks for the advice.
Basically, there are a few hundred machines dotted about the country which at a certain time of day, report usage, status and other statistics to a Table in a Central Database via SMS Messaging. This update is effectively just a dump of whatever is received into the Table - no validation is performed. The only thing we can rely on is that each record has a unique ascending key (RMA). Not a particularly clever design but it's the one we have to work with.
We do not have the ability to change the design of the Table in the Central Database. It is under the control of the people who make the hardware that populates it. The fact that duplicates can be generated may indicate a fault in a particular machine, or some sort of 'manual intervention', which we need to be awae of.
We read this table, identifying new records (by 'remembering' the last used RMA), add those into our Table and produce statistics and Invoices.
Perhaps we should add a 'Duplicates' column to our Table and in the reading process check for duplicates, count, report and remove them. I was rather hoping we didn't need to do that, but on reflection, and on the advice you've given I think it's going to be the most managable solution.
The application has been running for about a year and the duplicates issue arose in December 2010 - pity no one told us of the possibility when we started, or rather it's a pity we didn't think to ask, we could have catered for them in our original design - ah the benefits of 20/20 hindsight and thorough analysis and design
Last edited by Doogle; Jan 16th, 2011 at 03:24 AM.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|