Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8

Thread: SQL Duplicates

  1. #1
    New to the CF scene
    Join Date
    Mar 2009
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    SQL Duplicates

    Hello,
    I am building a 'waterfall' type report where I need to report duplicates based on multiple criteria.

    For example, suppose I can concatenate 5 columns to form a key. I would want to report the total number of duplicates with this concatenated key.

    That is easy, I can do that.

    Now suppose 3 of those 5 columns form another key where I need to also report duplicates. There will not be a one to one relationship between either key.

    I am trying to solve this within one SQL statement, and avoid any procedurel logic. I am confident I can come up with some way to hack this together, but I would prefer to keep it all within SQL.

    If anyone has ever created such a report, please share a solution. Thanks.

    Example:

    Col1 Col2 Col3 Col4 Col5 Col6
    A 1 6 X Y Z <-- full duplicate
    A 1 6 X Y Z <-- full duplicate
    A 1 6 $ $ $ <-- partial duplicate
    B 2 7 X Y J <-- full duplicate
    B 2 7 X Y B <-- full duplicate
    B 2 7 $ $ $ <-- partial duplicate
    C 4 7 X Y Z
    C 4 8 X Y Z
    D 5 8 X Y Z
    E 5 8 X Y Z


    Report Partial Duplicate Full Duplicate Total Duplicate
    A 1 6 1 2 3
    B 2 7 1 2 3
    C 4 7 0 0 0
    C 4 8 0 0 0
    D 5 8 0 0 0
    E 5 8 0 0 0

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,232
    Thanks
    80
    Thanked 4,456 Times in 4,421 Posts
    Can you please put [ code ] ... [ /code ] tags around your tables so we can see what you are after. Especially around the second table. I'm not sure what column name your various values are supposed to go with for that one.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,232
    Thanks
    80
    Thanked 4,456 Times in 4,421 Posts
    Also, you need to define "partial duplicate".

    I can see that

    A 1 6 X Y Z
    A 1 6 L M N

    are partial duplicates. But what about

    A 1 6 X Y Z
    B 2 3 X Y Z

    ???? that also has 3 field matches.

    Or even

    A 1 6 X Y Z
    B 1 5 X Q Z

    again, 3 matches.

    Hmmm????

  • #4
    New to the CF scene
    Join Date
    Mar 2009
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Such a rookie mistake. Here goes a better attempt.

    Col 1-5 would be a 'full' key. Col 1-3 would be a 'partial' key. I am trying to devise a way to do this report within one SQL statement (or maybe two) and avoid anything procedural.

    I don't have anything against using Perl or some other language to collate this all together, I'm just trying to build on some SQL knowledge. Thanks.

    Code:
    Col1 | Col2 | Col3 | Col4 | Col5 | Col6  |
     -----|------|------|------|------|------|
        A |    1 |    6 |    X |    Y |    1 |  <-- full duplicate 
        A |    1 |    6 |    X |    Y |    2 |  <-- full duplicate
        A |    1 |    6 |    $ |    $ |    $ |  <-- partial duplicate
        B |    2 |    7 |    X |    Y |    J |  <-- full duplicate
        B |    2 |    7 |    X |    Y |    B |  <-- full duplicate
        B |    2 |    7 |    $ |    $ |    $ |  <-- partial duplicate
        C |    4 |    7 |    X |    Y |    Z |  
        C |    4 |    8 |    X |    Y |    Z |  
        D |    5 |    8 |    X |    Y |    Z |  
        E |    5 |    8 |    X |    Y |    Z |   
    
    
     Col1 | Col2 | Col3 | Col4| Col5|Partial Dup Count | Full Dup Count | Total Dup Count|  
     -----|------|------|-----|-----|------------------|----------------|----------------|
     A    |    1 |    6 |   X |   Y |               1  |              2 |              3 |
     B    |    2 |    7 |   X |   Y |               1  |              2 |              3 |
     C    |    4 |    7 |   X |   Y |               0  |              0 |              0 |
     C    |    4 |    8 |   X |   Y |               0  |              0 |              0 |
     D    |    5 |    8 |   X |   Y |               0  |              0 |              0 |
     E    |    5 |    8 |   X |   Y |               0  |              0 |              0 |

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,232
    Thanks
    80
    Thanked 4,456 Times in 4,421 Posts
    I still don't quite see it...

    Consider: Your third row with Col1=A and your third row with Col1=B could *ALSO* be considered "master" rows.

    I don't know how you would avoid getting a report that looks like this:
    Code:
    Col1 | Col2 | Col3 | Col4| Col5|Partial Dup Count | Full Dup Count | Total Dup Count|  
     -----|------|------|-----|-----|------------------|----------------|----------------|
     A    |    1 |    6 |   X |   Y |               1  |              2 |              3 |
     A    |    1 |    6 |   $ |   $ |               2  |              0 |              2 |
     B    |    2 |    7 |   X |   Y |               1  |              2 |              3 |
     B    |    2 |    7 |   $ |   $ |               2  |              0 |              2 |
    etc.
    Either those records with those counts are going to appear or you need *ANOTHER* rule to exclude them. (For example, "don't report records that have no full dupes at all and that have already been counted as a partial dupe in some other record"??? though that rule is a toughie, I think.)

  • #6
    New Coder
    Join Date
    Aug 2008
    Posts
    22
    Thanks
    0
    Thanked 2 Times in 2 Posts
    This should solve your problem

    Code:
    select tblb.*,
    sum(if(concat(tbla.Col1,tbla.Col2,tbla.Col3)=concat(tblb.Col1,tblb.Col2,tblb.Col3) and concat(tbla.Col1,tbla.Col2,tbla.Col3,tbla.Col4,tbla.Col5)<>concat(tblb.Col1,tblb.Col2,tblb.Col3,tblb.Col4,tblb.Col5),1,0)) as `Full Dup Count`,
    sum(if(concat(tbla.Col1,tbla.Col2,tbla.Col3,tbla.Col4,tbla.Col5)=concat(tblb.Col1,tblb.Col2,tblb.Col3,tblb.Col4,tblb.Col5),1,0)) as `Full Dup Count`,
    sum(if(concat(tbla.Col1,tbla.Col2,tbla.Col3)=concat(tblb.Col1,tblb.Col2,tblb.Col3),1,0)) as `Total Dup Count` from tbla 
    left join (select Col1, Col2, Col3, Col4, Col5 from tbla group by Col1,Col2,Col3) tblb 
    on (tbla.Col1=tblb.Col1 and tbla.Col2=tblb.Col2 and tbla.Col3=tblb.Col3) 
    group by tblb.Col1,tblb.Col2,tblb.Col3,tblb.Col4,tblb.Col5
    tbla = your original table

    The Full Dup Count and Total Dup Count should be checked if its 1 then print 0
    Last edited by riwan; 03-09-2009 at 10:19 AM.

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,232
    Thanks
    80
    Thanked 4,456 Times in 4,421 Posts
    It can be coded simpler than that, just using a UNION.

    But that still doesn't answer my question/objection about the "extra" rows with partial-macth-only counts when those rows have already been counted in the full-match counts.

  • #8
    New Coder
    Join Date
    Aug 2008
    Posts
    22
    Thanks
    0
    Thanked 2 Times in 2 Posts
    I thought its already clear in his post what he wanted.
    He already showed the wanted result.
    which would be grouping the same 3 first col.
    At least the sql query I write above already omitted the extra rows that you said should be showing
    Last edited by riwan; 03-10-2009 at 03:32 PM.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •