Enjoy an ad free experience by logging in. Not a member yet? Register.

Results 1 to 8 of 8
Thread: SQL Duplicates

03052009, 06:37 PM #1
 Join Date
 Mar 2009
 Posts
 2
 Thanks
 0
 Thanked 0 Times in 0 Posts
SQL Duplicates
Hello,
I am building a 'waterfall' type report where I need to report duplicates based on multiple criteria.
For example, suppose I can concatenate 5 columns to form a key. I would want to report the total number of duplicates with this concatenated key.
That is easy, I can do that.
Now suppose 3 of those 5 columns form another key where I need to also report duplicates. There will not be a one to one relationship between either key.
I am trying to solve this within one SQL statement, and avoid any procedurel logic. I am confident I can come up with some way to hack this together, but I would prefer to keep it all within SQL.
If anyone has ever created such a report, please share a solution. Thanks.
Example:
Col1 Col2 Col3 Col4 Col5 Col6
A 1 6 X Y Z < full duplicate
A 1 6 X Y Z < full duplicate
A 1 6 $ $ $ < partial duplicate
B 2 7 X Y J < full duplicate
B 2 7 X Y B < full duplicate
B 2 7 $ $ $ < partial duplicate
C 4 7 X Y Z
C 4 8 X Y Z
D 5 8 X Y Z
E 5 8 X Y Z
Report Partial Duplicate Full Duplicate Total Duplicate
A 1 6 1 2 3
B 2 7 1 2 3
C 4 7 0 0 0
C 4 8 0 0 0
D 5 8 0 0 0
E 5 8 0 0 0
03052009, 11:10 PM
#2
Can you please put [ code ] ... [ /code ] tags around your tables so we can see what you are after. Especially around the second table. I'm not sure what column name your various values are supposed to go with for that one.
03052009, 11:12 PM
#3
Also, you need to define "partial duplicate".
I can see that
A 1 6 X Y Z
A 1 6 L M N
are partial duplicates. But what about
A 1 6 X Y Z
B 2 3 X Y Z
???? that also has 3 field matches.
Or even
A 1 6 X Y Z
B 1 5 X Q Z
again, 3 matches.
Hmmm????
03062009, 02:56 AM
#4
 Join Date
 Mar 2009
 Posts
 2
 Thanks
 0
 Thanked 0 Times in 0 Posts
Such a rookie mistake. Here goes a better attempt.
Col 15 would be a 'full' key. Col 13 would be a 'partial' key. I am trying to devise a way to do this report within one SQL statement (or maybe two) and avoid anything procedural.
I don't have anything against using Perl or some other language to collate this all together, I'm just trying to build on some SQL knowledge. Thanks.
Code:Col1  Col2  Col3  Col4  Col5  Col6   A  1  6  X  Y  1  < full duplicate A  1  6  X  Y  2  < full duplicate A  1  6  $  $  $  < partial duplicate B  2  7  X  Y  J  < full duplicate B  2  7  X  Y  B  < full duplicate B  2  7  $  $  $  < partial duplicate C  4  7  X  Y  Z  C  4  8  X  Y  Z  D  5  8  X  Y  Z  E  5  8  X  Y  Z  Col1  Col2  Col3  Col4 Col5Partial Dup Count  Full Dup Count  Total Dup Count  A  1  6  X  Y  1  2  3  B  2  7  X  Y  1  2  3  C  4  7  X  Y  0  0  0  C  4  8  X  Y  0  0  0  D  5  8  X  Y  0  0  0  E  5  8  X  Y  0  0  0 
03062009, 07:16 AM
#5
I still don't quite see it...
Consider: Your third row with Col1=A and your third row with Col1=B could *ALSO* be considered "master" rows.
I don't know how you would avoid getting a report that looks like this:
Either those records with those counts are going to appear or you need *ANOTHER* rule to exclude them. (For example, "don't report records that have no full dupes at all and that have already been counted as a partial dupe in some other record"??? though that rule is a toughie, I think.)Code:Col1  Col2  Col3  Col4 Col5Partial Dup Count  Full Dup Count  Total Dup Count  A  1  6  X  Y  1  2  3  A  1  6  $  $  2  0  2  B  2  7  X  Y  1  2  3  B  2  7  $  $  2  0  2  etc.
03092009, 10:09 AM
#6
 Join Date
 Aug 2008
 Posts
 22
 Thanks
 0
 Thanked 2 Times in 2 Posts
This should solve your problem
tbla = your original tableCode:select tblb.*, sum(if(concat(tbla.Col1,tbla.Col2,tbla.Col3)=concat(tblb.Col1,tblb.Col2,tblb.Col3) and concat(tbla.Col1,tbla.Col2,tbla.Col3,tbla.Col4,tbla.Col5)<>concat(tblb.Col1,tblb.Col2,tblb.Col3,tblb.Col4,tblb.Col5),1,0)) as `Full Dup Count`, sum(if(concat(tbla.Col1,tbla.Col2,tbla.Col3,tbla.Col4,tbla.Col5)=concat(tblb.Col1,tblb.Col2,tblb.Col3,tblb.Col4,tblb.Col5),1,0)) as `Full Dup Count`, sum(if(concat(tbla.Col1,tbla.Col2,tbla.Col3)=concat(tblb.Col1,tblb.Col2,tblb.Col3),1,0)) as `Total Dup Count` from tbla left join (select Col1, Col2, Col3, Col4, Col5 from tbla group by Col1,Col2,Col3) tblb on (tbla.Col1=tblb.Col1 and tbla.Col2=tblb.Col2 and tbla.Col3=tblb.Col3) group by tblb.Col1,tblb.Col2,tblb.Col3,tblb.Col4,tblb.Col5
The Full Dup Count and Total Dup Count should be checked if its 1 then print 0
Last edited by riwan; 03092009 at 10:19 AM.
03102009, 12:16 AM
#7
It can be coded simpler than that, just using a UNION.
But that still doesn't answer my question/objection about the "extra" rows with partialmacthonly counts when those rows have already been counted in the fullmatch counts.
03102009, 03:29 PM
#8
 Join Date
 Aug 2008
 Posts
 22
 Thanks
 0
 Thanked 2 Times in 2 Posts
I thought its already clear in his post what he wanted.
He already showed the wanted result.
which would be grouping the same 3 first col.
At least the sql query I write above already omitted the extra rows that you said should be showing
Last edited by riwan; 03102009 at 03:32 PM.