|
-
Feb 4th, 2005, 03:36 AM
#1
Thread Starter
Don't Panic!
[Information] Duplicates - A possible answer
Hi,
Came across this yesterday as my wife is doing some data processing and wanted to get rid of duplicates, yet keep the first instance. I never knew (or conveniantly forgot) about this function and I usually code up a loop to do the checks for me.
However, this may be slightly faster.
I'm going to write it in three steps although it may be accomplishes in two...
Example Data
ID - autonumber
Data - text
1 aaab
2 aabb
3 aaab
4 abba
Step one.
Make a query to get the data from your table. This should be something like:
Code:
SELECT tblData.Data,FIRST(tblData.ID) as FirstID
FROM tblData G
GROUP BY tblData.Data
Save this query (run it to check that only ids 1,2 and 4 are returned)
Step two.
Make a table to hold these matched IDs (tblMatched - ID - Number-long)
Make an append query using the above query to put the Ids into the tblMatched - Save this too.
** Edit:
Something like this
Code:
INSERT INTO tblMatched ( ID )
SELECT qryFirst.ID
FROM qryFirst
Step three.
Make a third query to delete (or flag up in a field or do whatever) the duplicated records, using a left join. Something like the following:
Code:
DELETE *.tblData
FROM tblData Left Join tblMatched ON tblData.ID=tblMatched.ID
WHERE tblMatched.id is Null
Save this too.
You could use an update query if you only want to flag those record and not delete them.
To use, delete everything from tblMatched, then run the append query then run the delete (update?) query
Vince
Last edited by Ecniv; Feb 4th, 2005 at 12:40 PM.
Feeling like a fly on the inside of a closed window (Thunk!)
If I post a lot, it is because I am bored at work! ;D Or stuck...
* Anything I post can be only my opinion. Advice etc is up to you to persue...
-
Feb 4th, 2005, 04:13 AM
#2
Fanatic Member
Re: [Information] Duplicates - A possible answer
..wanted to get rid of duplicates
hm..what duplicates?
from ur example ID hold autonumber field which is not duplicate..
Step two.
Make a table to hold these matched IDs (tblMatched - ID - Number-long)
Make an append query using the above query to put the Ids into the tblMatched - Save this too.
can u show the code for it??
Code:
DELETE *.tblData
FROM tblData Left Join tblMatched ON tblData.ID=tblMatched.ID
WHERE tblMatched.id is Null
if it's NULL..can it be Joined?
sorry if i'm not quite understand
-
Feb 4th, 2005, 07:57 AM
#3
Re: [Information] Duplicates - A possible answer
ecniv - very nice...
The duplicate data is what is being GROUP BY - tblData.data
The FIRST() function (I didn't know this existed either MIN() and MAX() sure - but never heard of FIRST()) - is returning the "unique identifier" of the row that is to be saved.
The NULL is a by-product of the LEFT-JOIN - LEFT-JOIN gives us NULLS in the other columns when the row doesn't match the TBLMATCHID - which is the single row we want to retain from the GROUP BY.
It seems this could all be done in a single query with a sub-query or EXISTS clause - I might play with this when I get to work.
-
Feb 4th, 2005, 08:26 AM
#4
Thread Starter
Don't Panic!
Re: [Information] Duplicates - A possible answer
Erick:
See szlamany's post - I was too slow to answer that sorry.
Duplicate is the fourth line of data - same text different ID
The append would look something like this (apologies in advance for getting in the wrong place - from my head - use the query builder and check the sql statement )
Code:
INSERT INTO tblMatching ( ID )
SELECT qryFirst.ID
FROM qryFirst
qryFirst is the query to pull the first
szlamany:
I don't think it can go in one query because you are pointing to the table and using the group by / first function makes the recordset non updatable. :/ shame really. But still the update may be compressed into one query 
Also there is 'Last' function... bit like max I guess...
Using IDs max would work too (assuming in date order).
Vince
Feeling like a fly on the inside of a closed window (Thunk!)
If I post a lot, it is because I am bored at work! ;D Or stuck...
* Anything I post can be only my opinion. Advice etc is up to you to persue...
-
Feb 4th, 2005, 09:02 AM
#5
Re: [Information] Duplicates - A possible answer
MAX and MIN don't work on all datatypes - I guess that's why we have FIRST and LAST.
Are you using MS SQL SERVER?
-
Feb 4th, 2005, 10:05 AM
#6
Thread Starter
Don't Panic!
Re: [Information] Duplicates - A possible answer
Feeling like a fly on the inside of a closed window (Thunk!)
If I post a lot, it is because I am bored at work! ;D Or stuck...
* Anything I post can be only my opinion. Advice etc is up to you to persue...
-
Feb 4th, 2005, 01:08 PM
#7
Thread Starter
Don't Panic!
-
Feb 4th, 2005, 08:50 PM
#8
Fanatic Member
Re: [Information] Duplicates - A possible answer
Duplicate is the fourth line of data - same text different ID
my mistake..i always think the PK is the first Column..so not give a more look to Data field and ur GROUP BY Clause..sorry
and because of that it make the DELETE Query make sense to me now
nice work Ecniv
EDIT : it sure useful for me
thx
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|