Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 14 of 14
  1. #1
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts

    Delete Duplicates

    I'm running the following query to gather a list of duplicates based on 2 fields....lnktxt and cattxt:

    SELECT P1.lnktxt, P1.cattxt, P1.dsptxt
    FROM TABLE AS P1,
    ( SELECT lnktxt, cattxt, dsptxt, COUNT(*) AS howmany
    FROM TABLE
    GROUP BY lnktxt, cattxt, dsptxt
    HAVING howmany > 1 ) AS P2
    WHERE P1.lnktxt = P2.lnktxt AND P1.cattxt = P2.cattxt

    It is displaying fine, and in PHPMyAdmin when I check the boxes to delete the individual records, it is not releasing the P1.lnktxt and the SQL server is erroring out. Is there a better method to gather and display a list of duplicates?
    I don't want them to be automatically deleted, I would like to review the records first and select which one to delete.

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    If you don't have something that uniquely identifies each record, you can't select which one to delete. In other words, if you don't have a PRIMARY KEY in your (very badly named) TABLE then you can't do this easily.

    If you do have a primary key, just SELECT it along with the other fields and *probably* PhpMyAdmin will be able to do it.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    If you don't have something that uniquely identifies each record, you can't select which one to delete. In other words, if you don't have a PRIMARY KEY in your (very badly named) TABLE then you can't do this easily.

    If you do have a primary key, just SELECT it along with the other fields and *probably* PhpMyAdmin will be able to do it.

    Well, I used TABLE just to mask the table name...so yes, I would suppose my ruse would be a badly named table.

    The reason I asked the question is I recently switched hosts and the previous hosts PHPMyAdmin allowed me to use the exact query (with the correct table name) with no problem, but the new hosts phpmyadmin does not.

    And no, I do not have a primary key defined as I was not the original creator of the database and now that I have 23k+ records, I was never sure of the implications of trying to define one now, and especially because there isn't a field that can be truly unique...a link in the lnktxt field would have the same dsptxt field info, but could have a multiple entries, each with a unique cattxt....I would almost have to create a brand new field and populate it with data just to be able to have a primary key.
    Last edited by mharrison; 02-18-2013 at 06:49 PM. Reason: Additional Info

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    No no...it's easy!

    Just do this:
    Code:
    ALTER TABLE tablename ADD uniqueid INT AUTO_INCREMENT PRIMARY KEY;
    And that will not only add the field, make it a primary key, but it will also RIGHT THEN AND THERE populate *ALL* the existing records with the field!

    Presto.

    (If you want to, after removing the duplicates you could then remove the column, but I wouldn't bother.)
    Last edited by Old Pedant; 02-18-2013 at 08:47 PM.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #5
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Ok, so now my table has a primary key. When I use any simple query that counts the 2 fields I want to reference for duplicates, it will show me that the record has 2 or more matches, but it does not let me delete one of them.

    Example:
    SELECT lnktxt, cattxt, COUNT(*) c FROM tablenam GROUP BY lnktxt HAVING c > 1;

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    You *MUST* also SELECT the primary key field!

    If you do that, in PHPMyAdmin, it should then be able to delete the records based on that primary key.

    I don't use PHPMyAdmin, but I could certainly build a web page that would allow this in under an hour, for example.

    Code:
    SELECT P1.uniqueid, P1.lnktxt, P1.cattxt, P1.dsptxt
    FROM TABLE AS P1,
    ( SELECT lnktxt, cattxt, dsptxt, COUNT(*) AS howmany
    FROM TABLE
    GROUP BY lnktxt, cattxt, dsptxt
    HAVING howmany > 1 ) AS P2
    WHERE P1.lnktxt = P2.lnktxt AND P1.cattxt = P2.cattxt AND P1.dsptxt = P2.dsptxt
    ORDER BY lnktxt, cattxt, dsptxt
    And if you are going to count two records as identical only if all THREE fields agree (lnktxt, cattxt, dsptxt -- which is the meaning of your GROUP BY and HAVING there), then you really should also include those in the WHERE clause, as shown.

    If you don't care about whether dsptxt is duplicated or not, then don't use it in the subquery.

    Also, if you don't specify an ORDER BY clause, there is no guarantee that identical (except for uniqueid) records will be even close together in the output.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #7
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Even with the modifications to the original query you listed, the problem I am having is not being able to find the records, it's being able to delete them. PHPMyAdmin will show me the records and I can select the records I want to delete, but when it writes a query to delete the records, it errors out because it is still trying to use P1.lnktxt and so on.

    Perhaps I should just document the uniqueid's and then write my own query to delete them that way.

    My frustration is that with the version of phpmyadmin on my old webhost, I could delete the records by selecting which ones I wanted to delete right from phpmyadmin, but with my new webhost, I cannot as it errors out.

  • #8
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    Well, like I said. Write your own code to do this. It's trivial. I don't use PHP, but it will be something as simple as this:
    Code:
    <form action="myDelete.php" method="post">
    <table border="1" cellpadding="3">
    <tr>
        <th>DELETE</th><th>Lnktxt</th><th>Cattxt</th><th>Dsptxt</th>
    </tr>
    <?php
    ... make your db connection ...
    $sql = "SELECT P1.uniqueid, P1.lnktxt, P1.cattxt, P1.dsptxt
            FROM TABLE AS P1,
            ( SELECT lnktxt, cattxt, dsptxt, COUNT(*) AS howmany
              FROM TABLE
              GROUP BY lnktxt, cattxt, dsptxt
              HAVING howmany > 1 ) AS P2
            WHERE P1.lnktxt = P2.lnktxt AND P1.cattxt = P2.cattxt AND P1.dsptxt = P2.dsptxt
            ORDER BY lnktxt, cattxt, dsptxt ";
    
    $result = mysql_query($sql);
    
    while ( row = mysql_fetch_assoc($result) )
    {
        $id = row["uniqueid"];
        echo '<tr><td><input type="checkbox" name="delete[]" value="' .$id . '" />' . $id . "</td>\n";
        echo "<td>...the other fields...</td></tr>\n";
    }
    ?>
    </table>
    <input type="submit" value="delete checked rows">
    </form>
    Now, as I said, I don't use PHP, but I *THINK* your "myDelete.php" page would be as simple as this:
    Code:
    <?php
    ...make db connection ...
    $sql = "DELETE FROM TABLE 
            WHERE uniqueid IN (" . implode(",",$_POST["delete"]) . ")";
    mysql_query($sql);
    ?>
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #9
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    The php code only errors out, but I'm going to approach this from a different angle.

    I have 3 columns I want to deal with here....
    lnktxt
    cattxt
    datetxt

    I want to find all records that are the same in lnktxt that also have the same value in cattxt. I then want to delete the record whose datetxt record is older (ie... delete 2012-11-12 rather than 2013-01-25)

    I can find the records but I am not sure how I would write the delete statement to be able to distinguish the date criteria, if it can even be done.

  • #10
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    ** SIGH **

    Why didn't you mention the DATETXT field *long* ago??

    Okay, now we only have one problem: Why is that field called dateTXT? That would seem to indicate to me that you are (foolishly) storing the date as a text (e.g., VARCHAR) value, rather than as a TIMESTAMP or DATETIME value.

    Is that so?

    If it is, then were you at least consistent in the format of the dates you stored in it?

    Can you select, say, 10 records and show what they all have in that field?

    On the other hand, if that field really is a TIMESTAMP or DATETIME field, and only the name is misleading, then we are ready to go.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #11
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    It is a DATE value field...I didn't name it and haven't bother to change it yet due to other concerns with getting the website fully functional...
    Last edited by mharrison; 02-22-2013 at 07:23 PM.

  • #12
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    Perfect then.

    And trivial:
    Code:
    DELETE t1 
    FROM tablename AS t1, tablename AS t2
    WHERE t1.lnktxt = t2.lnktxt
      AND t1.cattxt = t2.cattxt
      AND t1.datetxt < t2.datetxt
    ALSO: If you happen to have 3 (or more!) records with the same lnktxt and cattxt values, that will delete *all* of them except the one with the latest datetxt field.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    mharrison (02-22-2013)

  • #13
    New Coder
    Join Date
    Dec 2012
    Posts
    54
    Thanks
    12
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Perfect then.

    And trivial:
    Code:
    DELETE t1 
    FROM tablename AS t1, tablename AS t2
    WHERE t1.lnktxt = t2.lnktxt
      AND t1.cattxt = t2.cattxt
      AND t1.datetxt < t2.datetxt
    ALSO: If you happen to have 3 (or more!) records with the same lnktxt and cattxt values, that will delete *all* of them except the one with the latest datetxt field.
    That works wonderfully! One final question on this mission....is there any way to modify it so if it runs across an instance where all 3 fields are the same, it can drop all but one?

  • #14
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    Yes, if you added that AUTO_INCREMENT field.

    Say you named it "uniqueid".

    Then:
    Code:
    DELETE t1 
    FROM tablename AS t1, tablename AS t2
    WHERE t1.lnktxt = t2.lnktxt
      AND t1.cattxt = t2.cattxt
      AND t1.datetxt = t2.datetxt
      AND t1.uniqueid < t2.uniqueid
    But run the other query first, so that you get rid of the ones that *DO* differ by date. And only then run this one as a sort of cleanup.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    mharrison (02-22-2013)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •