dcsimg
Results 1 to 8 of 8

Thread: [Information] Duplicates - A possible answer

  1. #1

    Thread Starter
    Don't Panic! Ecniv's Avatar
    Join Date
    Nov 2000
    Location
    Amsterdam...
    Posts
    5,343

    Exclamation [Information] Duplicates - A possible answer

    Hi,

    Came across this yesterday as my wife is doing some data processing and wanted to get rid of duplicates, yet keep the first instance. I never knew (or conveniantly forgot) about this function and I usually code up a loop to do the checks for me.

    However, this may be slightly faster.
    I'm going to write it in three steps although it may be accomplishes in two...

    Example Data
    ID - autonumber
    Data - text

    1 aaab
    2 aabb
    3 aaab
    4 abba

    Step one.
    Make a query to get the data from your table. This should be something like:
    Code:
    SELECT tblData.Data,FIRST(tblData.ID) as FirstID
    FROM tblData G
    GROUP BY tblData.Data
    Save this query (run it to check that only ids 1,2 and 4 are returned)

    Step two.
    Make a table to hold these matched IDs (tblMatched - ID - Number-long)
    Make an append query using the above query to put the Ids into the tblMatched - Save this too.
    ** Edit:
    Something like this
    Code:
    INSERT INTO tblMatched ( ID )
    SELECT qryFirst.ID
    FROM qryFirst

    Step three.
    Make a third query to delete (or flag up in a field or do whatever) the duplicated records, using a left join. Something like the following:
    Code:
    DELETE *.tblData
    FROM tblData Left Join tblMatched ON tblData.ID=tblMatched.ID
    WHERE tblMatched.id is Null
    Save this too.
    You could use an update query if you only want to flag those record and not delete them.


    To use, delete everything from tblMatched, then run the append query then run the delete (update?) query



    Vince
    Last edited by Ecniv; Feb 4th, 2005 at 12:40 PM.

    BOFH Now, BOFH Past, Information on duplicates

    Feeling like a fly on the inside of a closed window (Thunk!)
    If I post a lot, it is because I am bored at work! ;D Or stuck...
    * Anything I post can be only my opinion. Advice etc is up to you to persue...

  2. #2
    Fanatic Member
    Join Date
    Sep 2004
    Location
    Jakarta, Indonesia
    Posts
    818

    Re: [Information] Duplicates - A possible answer

    ..wanted to get rid of duplicates
    hm..what duplicates?
    from ur example ID hold autonumber field which is not duplicate..

    Step two.
    Make a table to hold these matched IDs (tblMatched - ID - Number-long)
    Make an append query using the above query to put the Ids into the tblMatched - Save this too.
    can u show the code for it??

    Code:
    DELETE *.tblData
    FROM tblData Left Join tblMatched ON tblData.ID=tblMatched.ID
    WHERE tblMatched.id is Null
    if it's NULL..can it be Joined?

    sorry if i'm not quite understand

    1st NF - a table should not contain repeating groups.
    2nd NF - any fields that do not depend fully on the primary key should be moved to another table.
    3rd NF - there should be no dependency between non key fields in same table.
    - E. Petroutsos -


    eRiCk

    A collection of "Laku-abis" Ebook, Permanent Residence

    Access Reserved Words, a Classic Form Bug, Access Limitation, Know run Process and the Lock they hold in, Logging User Activity in MSSQL,
    Kill Database Processes

  3. #3
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    CT
    Posts
    17,843

    Re: [Information] Duplicates - A possible answer

    ecniv - very nice...

    The duplicate data is what is being GROUP BY - tblData.data

    The FIRST() function (I didn't know this existed either MIN() and MAX() sure - but never heard of FIRST()) - is returning the "unique identifier" of the row that is to be saved.

    The NULL is a by-product of the LEFT-JOIN - LEFT-JOIN gives us NULLS in the other columns when the row doesn't match the TBLMATCHID - which is the single row we want to retain from the GROUP BY.

    It seems this could all be done in a single query with a sub-query or EXISTS clause - I might play with this when I get to work.

  4. #4

    Thread Starter
    Don't Panic! Ecniv's Avatar
    Join Date
    Nov 2000
    Location
    Amsterdam...
    Posts
    5,343

    Re: [Information] Duplicates - A possible answer

    Erick:
    See szlamany's post - I was too slow to answer that sorry.
    Duplicate is the fourth line of data - same text different ID

    The append would look something like this (apologies in advance for getting in the wrong place - from my head - use the query builder and check the sql statement )
    Code:
    INSERT INTO tblMatching ( ID )
    SELECT qryFirst.ID
    FROM qryFirst
    qryFirst is the query to pull the first

    szlamany:
    I don't think it can go in one query because you are pointing to the table and using the group by / first function makes the recordset non updatable. :/ shame really. But still the update may be compressed into one query
    Also there is 'Last' function... bit like max I guess...


    Using IDs max would work too (assuming in date order).


    Vince

    BOFH Now, BOFH Past, Information on duplicates

    Feeling like a fly on the inside of a closed window (Thunk!)
    If I post a lot, it is because I am bored at work! ;D Or stuck...
    * Anything I post can be only my opinion. Advice etc is up to you to persue...

  5. #5
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    CT
    Posts
    17,843

    Re: [Information] Duplicates - A possible answer

    MAX and MIN don't work on all datatypes - I guess that's why we have FIRST and LAST.

    Are you using MS SQL SERVER?

  6. #6

    Thread Starter
    Don't Panic! Ecniv's Avatar
    Join Date
    Nov 2000
    Location
    Amsterdam...
    Posts
    5,343

    Re: [Information] Duplicates - A possible answer

    MS Access 2003

    BOFH Now, BOFH Past, Information on duplicates

    Feeling like a fly on the inside of a closed window (Thunk!)
    If I post a lot, it is because I am bored at work! ;D Or stuck...
    * Anything I post can be only my opinion. Advice etc is up to you to persue...

  7. #7

    Thread Starter
    Don't Panic! Ecniv's Avatar
    Join Date
    Nov 2000
    Location
    Amsterdam...
    Posts
    5,343

    Re: [Information] Duplicates - A possible answer

    OK !!!


    Addenum. Yesterday I ran this and it worked

    Today no go. Maybe because I have more than the examples two fields (id and data) today I have ID, data1,data2,data3 and data4.
    Complains that it cannot delete from specified tables (crap!!)

    A work around is to use an update query instead and flag a y/n field to true for those that are duplicated.

    Then use a delete query on the flagged fields....grr ms....


    If you manage to get the straight deletion to work, please please post up
    Anyway - hope this is useful to someone somewhere



    **** IMPORTANT ****
    On the tblMatched the id must be a primary key... Then deletes no problem.

    Very weird, but I guess there is some logic somewhere about this.
    Last edited by Ecniv; Feb 4th, 2005 at 01:23 PM.

    BOFH Now, BOFH Past, Information on duplicates

    Feeling like a fly on the inside of a closed window (Thunk!)
    If I post a lot, it is because I am bored at work! ;D Or stuck...
    * Anything I post can be only my opinion. Advice etc is up to you to persue...

  8. #8
    Fanatic Member
    Join Date
    Sep 2004
    Location
    Jakarta, Indonesia
    Posts
    818

    Re: [Information] Duplicates - A possible answer

    Duplicate is the fourth line of data - same text different ID
    my mistake..i always think the PK is the first Column..so not give a more look to Data field and ur GROUP BY Clause..sorry

    and because of that it make the DELETE Query make sense to me now
    nice work Ecniv

    EDIT : it sure useful for me

    thx

    1st NF - a table should not contain repeating groups.
    2nd NF - any fields that do not depend fully on the primary key should be moved to another table.
    3rd NF - there should be no dependency between non key fields in same table.
    - E. Petroutsos -


    eRiCk

    A collection of "Laku-abis" Ebook, Permanent Residence

    Access Reserved Words, a Classic Form Bug, Access Limitation, Know run Process and the Lock they hold in, Logging User Activity in MSSQL,
    Kill Database Processes

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width