Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 2 of 2 FirstFirst 12
Results 16 to 21 of 21

Thread: dupicates check

  1. #16
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    This is pretty much exactly what I was thinking as well, minus the staging table. I can see your point on this one, but treating it as staging then truncating it is a good idea.

    "Tombstone" records are IMO the right approach, where tombstoning it is based on whatever criteria you want. Users sessions expire if they've had no activity for 15 minutes, a article is old if its > 3 weeks, and in this case a record is inactive if isCurrentlyActive is 0 (does MySQL actually support boolean yet?). So its simply a matter of your criteria, and unless diskspace is a huge concern, there's really not a need to delete "expired" records (except for something like non-persisting sessions since you'll never reuse them).

  2. Users who have thanked Fou-Lu for this post:

    SlayerACC (01-10-2013)

  3. #17
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,551
    Thanks
    80
    Thanked 4,617 Times in 4,581 Posts
    MySQL allows BOOLEAN as an alias for BIT:
    Code:
    CREATE TABLE foo (
        flag BOOLEAN,
        flag2 BIT
    );
    But if you then use DESCRIBE foo; both fields will show up as BIT(1).

    It also allows the keywords TRUE and FALSE:
    Code:
    UPDATE foo SET flag = True WHERE id = 777;
    But, again, if you then do SELECT flag FROM foo you will see a 1 or 0 value for the BIT field.

    So they really just have 3 aliases in place: BOOLEAN for BIT(1), TRUE for 1, FALSE for 0.

    Whoopee.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  4. #18
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    Quote Originally Posted by Old Pedant View Post
    Whoopee.
    Lols
    I'm happy with that, I like to use =TRUE and =FALSE if I have the option. Not that storage wise it makes a difference, but it reminds me that it should only be true or false.
    Come to think of it I seem to recall that true and false have worked for awhile. MySQL used to use the tinyint though, but I think it was much larger than a single bit even for its smallest size? Not that it matters.

    Still, its no different than booleans in PHP :P

  5. #19
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    243
    Thanks
    51
    Thanked 3 Times in 3 Posts
    At the end of all this I think I am more confused then when I started..

    Should I use

    Insert ignore statement.


    or should I use
    UPDATE maintable SET isCurrentlyActive = 0;

    INSERT INTO maintable (primaryKeyField, field1, field2, field3, isCurrentlyActive )
    SELECT primaryKeyField, field1, field2, field3, 1 FROM stagingtable
    ON DUPLICATE KEY UPDATE isCurrentlyActive = 1;

    TRUNCATE TABLE stagingTable;
    I am lost....I would kill to know this much about this area.


    Slayer.

  6. #20
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,551
    Thanks
    80
    Thanked 4,617 Times in 4,581 Posts
    I *think* FouLu is agreeing with me. He wrote
    This is pretty much exactly what I was thinking as well, minus the staging table. I can see your point on this one, but treating it as staging then truncating it is a good idea.
    So... It's really a variation on your original plan.

    Yes, you have a separate table that you import the CSV file into.

    But you use your master table to keep *ALL* records *EVER* imported. And you use that flag, isCurrentlyActive (or any other name you prefer, of course), to indicate "live" records. That way, all your prior records are always there. And having all in a single table is *highly* preferable. For example, it means you can search both historical and current data in a single query by simply ignoring th isCurrentlyActive flag.

    Do you understand the steps involved in this:
    Code:
    UPDATE maintable SET isCurrentlyActive = 0;
    
    INSERT INTO maintable (primaryKeyField, field1, field2, field3, isCurrentlyActive )
    SELECT primaryKeyField, field1, field2, field3, 1 FROM stagingtable
    ON DUPLICATE KEY UPDATE isCurrentlyActive = 1;
    
    TRUNCATE TABLE stagingTable;
    It's really pretty simple:

    First, you import the CSV file into your "staging" table. (That step is not shown above.)

    Then, you change *ALL* records in the main table to indicate that they are historic. That is, they are *NOT* currently active.

    Now you merge your "staging" records into the main table. But the ON DUPLICATE KEY simply says "if I try to merge a record from staging into the main table *AND* the KEY for that record already exists in the main table, then forget about the merge and instead simply mark the record in the main table as *ACTIVE* again." Doesn't that make sense? You first made all the records inactive and then, on the merge, you "resurrect" any records that are being re-imported.

    Once the merge is done, you wipe out all the records in the staging table, ready to do it all again next week.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  7. Users who have thanked Old Pedant for this post:

    SlayerACC (01-10-2013)

  8. #21
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    243
    Thanks
    51
    Thanked 3 Times in 3 Posts
    Thank you guys for taking sooooo much time on this..

    I will implement this and see how it all works.


    A big thanks to you both.


    Slayer.


 
Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •