[Information] Duplicates - A possible answer
Hi,
Came across this yesterday as my wife is doing some data processing and wanted to get rid of duplicates, yet keep the first instance. I never knew (or conveniantly forgot) about this function and I usually code up a loop to do the checks for me.
However, this may be slightly faster.
I'm going to write it in three steps although it may be accomplishes in two...
Example Data
ID - autonumber
Data - text
1 aaab
2 aabb
3 aaab
4 abba
Step one.
Make a query to get the data from your table. This should be something like:
Code:
SELECT tblData.Data,FIRST(tblData.ID) as FirstID
FROM tblData G
GROUP BY tblData.Data
Save this query (run it to check that only ids 1,2 and 4 are returned)
Step two.
Make a table to hold these matched IDs (tblMatched - ID - Number-long)
Make an append query using the above query to put the Ids into the tblMatched - Save this too.
** Edit:
Something like this
Code:
INSERT INTO tblMatched ( ID )
SELECT qryFirst.ID
FROM qryFirst
Step three.
Make a third query to delete (or flag up in a field or do whatever) the duplicated records, using a left join. Something like the following:
Code:
DELETE *.tblData
FROM tblData Left Join tblMatched ON tblData.ID=tblMatched.ID
WHERE tblMatched.id is Null
Save this too.
You could use an update query if you only want to flag those record and not delete them.
To use, delete everything from tblMatched, then run the append query then run the delete (update?) query
:thumb:
Vince
Re: [Information] Duplicates - A possible answer
Quote:
..wanted to get rid of duplicates
hm..what duplicates?
from ur example ID hold autonumber field which is not duplicate..
Quote:
Step two.
Make a table to hold these matched IDs (tblMatched - ID - Number-long)
Make an append query using the above query to put the Ids into the tblMatched - Save this too.
can u show the code for it??
Code:
DELETE *.tblData
FROM tblData Left Join tblMatched ON tblData.ID=tblMatched.ID
WHERE tblMatched.id is Null
if it's NULL..can it be Joined?
sorry if i'm not quite understand
Re: [Information] Duplicates - A possible answer
ecniv - very nice...
The duplicate data is what is being GROUP BY - tblData.data
The FIRST() function (I didn't know this existed either MIN() and MAX() sure - but never heard of FIRST()) - is returning the "unique identifier" of the row that is to be saved.
The NULL is a by-product of the LEFT-JOIN - LEFT-JOIN gives us NULLS in the other columns when the row doesn't match the TBLMATCHID - which is the single row we want to retain from the GROUP BY.
It seems this could all be done in a single query with a sub-query or EXISTS clause - I might play with this when I get to work.
Re: [Information] Duplicates - A possible answer
Erick:
See szlamany's post - I was too slow to answer that sorry.
Duplicate is the fourth line of data - same text different ID
The append would look something like this (apologies in advance for getting in the wrong place - from my head - use the query builder and check the sql statement :) )
Code:
INSERT INTO tblMatching ( ID )
SELECT qryFirst.ID
FROM qryFirst
qryFirst is the query to pull the first
szlamany:
I don't think it can go in one query because you are pointing to the table and using the group by / first function makes the recordset non updatable. :/ shame really. But still the update may be compressed into one query :)
Also there is 'Last' function... bit like max I guess...
Using IDs max would work too (assuming in date order).
Vince
Re: [Information] Duplicates - A possible answer
MAX and MIN don't work on all datatypes - I guess that's why we have FIRST and LAST.
Are you using MS SQL SERVER?
Re: [Information] Duplicates - A possible answer
Re: [Information] Duplicates - A possible answer
OK !!!
Addenum. Yesterday I ran this and it worked
Today no go. Maybe because I have more than the examples two fields (id and data) today I have ID, data1,data2,data3 and data4.
:cry: Complains that it cannot delete from specified tables (crap!!)
A work around is to use an update query instead and flag a y/n field to true for those that are duplicated.
Then use a delete query on the flagged fields....grr ms....
If you manage to get the straight deletion to work, please please post up :)
Anyway - hope this is useful to someone somewhere :eek:
**** IMPORTANT ****
On the tblMatched the id must be a primary key... Then deletes no problem.
Very weird, but I guess there is some logic somewhere about this.
Re: [Information] Duplicates - A possible answer
Quote:
Duplicate is the fourth line of data - same text different ID
my mistake..i always think the PK is the first Column..so not give a more look to Data field and ur GROUP BY Clause..sorry :blush:
and because of that it make the DELETE Query make sense to me now
nice work Ecniv
EDIT : it sure useful for me :thumb:
thx