Results 1 to 5 of 5

Thread: [RESOLVED] How to detect and list all records in a table that are duplicated.

  1. #1
    Fanatic Member
    Join Date
    Dec 07
    Location
    Albacete, españa
    Posts
    579

    Resolved [RESOLVED] How to detect and list all records in a table that are duplicated.

    I think that SQL can do this, but, despite searching the internet, I cannot find anything to solve this problem. All I found was checking for duplicates before the records are added.
    Basically, if there are 2 identical records already existing in the database, I want to identify and list them (with the Key).
    I think the SQL string is something along the lines of "SELECT pyID, pyStudent, pyClass, pyPaid FROM tblPay WHERE pyStudent, pyPaid, pyClass ARE THE SAME IN 2 OR MORE RECORDS AND COUNT(pyClass)>1" or something like that. Is it possible? I've been scratching my head for days on this!
    A fun card game written in VBA within Excel Tri Peaks

  2. #2
    Super Moderator si_the_geek's Avatar
    Join Date
    Jul 02
    Location
    Bristol, UK
    Posts
    35,558

    Re: How to detect and list all records in a table that are duplicated.

    Returning the dupes is something like this:
    Code:
    SELECT pyStudent, pyClass, pyPaid 
    FROM tblPay 
    GROUP BY pyStudent, pyPaid, pyClass 
    HAVING COUNT(*)>1
    ...but in order to get the values for the extra field, you either need to get only the min/max value (just add it to the Select list), or use the above as a sub-query to join to, eg:
    Code:
    SELECT P1.pyID, P1.pyStudent, P1.pyClass, P1.pyPaid 
    FROM tblPay P1
    INNER JOIN (
      SELECT pyStudent, pyClass, pyPaid 
      FROM tblPay 
      GROUP BY pyStudent, pyPaid, pyClass 
      HAVING COUNT(*)>1
    ) as P2 on (P1.pyStudent = P2 pyStudent AND P1.pyPaid = P2.pyPaid AND P1.pyClass = P2.pyClass)

  3. #3
    Fanatic Member
    Join Date
    Dec 07
    Location
    Albacete, españa
    Posts
    579

    Re: How to detect and list all records in a table that are duplicated.

    Quote Originally Posted by si_the_geek View Post
    Code:
    SELECT P1.pyID, P1.pyStudent, P1.pyClass, P1.pyPaid 
    FROM tblPay P1
    INNER JOIN (
      SELECT pyStudent, pyClass, pyPaid 
      FROM tblPay 
      GROUP BY pyStudent, pyPaid, pyClass 
      HAVING COUNT(*)>1
    ) as P2 on (P1.pyStudent = P2 pyStudent AND P1.pyPaid = P2.pyPaid AND P1.pyClass = P2.pyClass)
    OK, to check I understand how it works, and it does work, thank you Si:
    GROUP BY sorts the records and puts all equal ones together
    HAVING COUNT(*)>1 selects and returns those that are the same as the record next to it (ie duplicates)
    AS P2 sets this selection to a temporary (and arbitrary) variable so it can be referenced in the main part of the SELECT
    INNER JOIN ON (... selects again the records selected as duplicates from the entire list again and The initial SELECT shows the fields I want to return.
    This is how I think it is working, please correct me if I'm wrong.
    One thing I don't understand is "FROM tblPay P1" to assign P1 but in the INNER JOIN, you assign with "AS P2". I would have expected "FROM tblPay AS P1"???
    A fun card game written in VBA within Excel Tri Peaks

  4. #4
    Super Moderator si_the_geek's Avatar
    Join Date
    Jul 02
    Location
    Bristol, UK
    Posts
    35,558

    Re: How to detect and list all records in a table that are duplicated.

    You've pretty much got it.


    Group By merges records together where the fields you specify are equal. You must specify all fields in the Select list that don't have an aggregate function (eg: "Sum(Quantity)" should not be in the Group By, and it will work on all of the "equal" records).


    The P1 and P2 are aliases, which are used to identify which 'copy' of the table you are working with. P1 refers to the plain copy (which is actually the original table), and P2 refers to the sub-query version (with the Group By etc).

    The "( SELECT pyStudent, ... ) as P2" part basically creates a temporary table called P2, which you then use in the rest of the query... so the Join is the same as it would be if the P2 table had been created separately (except then it would be written as "INNER JOIN P2 on ...").

    To be honest the P1 alias isn't needed (you could just use tblPay), but using it makes what is happening a bit more obvious, and therefore easier to understand/maintain.

  5. #5
    Fanatic Member
    Join Date
    Dec 07
    Location
    Albacete, españa
    Posts
    579

    Re: How to detect and list all records in a table that are duplicated.

    Thank you. Calling P1 and P2 aliases made it very clear what was happening.
    A fun card game written in VBA within Excel Tri Peaks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •