Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    New Coder
    Join Date
    May 2012
    Posts
    89
    Thanks
    51
    Thanked 0 Times in 0 Posts

    Tagging system set up - which is better between these two?

    Gidday guys

    If I'm a) limiting the amount of tags stored per bookmark to two, and b) only allowing users to use either none or one tag per search...

    ...I'm wondering which would make for faster queries out of the following two systems:

    The "Toxi" system:




    ...or...

    BookmarkID | Blah | Tag1 | Tag2 |

    (so a query where a user has chosen to include a tag in their search would include....

    WHERE BookmarkID = $bookmark
    AND (Tag1 = $tag OR Tag2 = $tag)

    (with columns Tag1 and Tag2 indexed)

    Or, is there a better solution nowadays?

    Thanks for your guidance.

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,211
    Thanks
    80
    Thanked 4,571 Times in 4,535 Posts
    What is "Toxi"? Where did that come from?

    Wherever, it's also the properly normalized version.

    If you truly have the limitations you specified and you 100%-promise-to-delete-the-entire-site-if-you-change-your-mind-kind-of-sincerity that those limitations won't changed, then a little bit of denormalization as you propose would surely not hurt.

    Would it help? I tend to doubt it.

    I would like to point out that if you do denormalize then there are several possible future queries that get tougher. Example:
    "How many bookmarks use both tag13 and tag93?"
    [WHERE (tag1=13 AND tag2=93) OR (tag1=93 AND tag2=13)]
    [ugh!]

    "How many bookmarks have only one tag?"
    [WHERE (tag1 IS NOT NULL AND tag2 IS NULL) OR (tag1 IS NULL AND tag2 IS NOT NULL]
    [double ugh!]

    "How many bookmarks use exactly the same two tags as another bookmark?"
    [much more code than I want to show]

    All those--and more--are trivial in the fully normalized ("Toxi") case.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    shaunthomson (02-07-2013)

  • #3
    New Coder
    Join Date
    May 2012
    Posts
    89
    Thanks
    51
    Thanked 0 Times in 0 Posts
    Cheers Old Pedant

    Why would I need to delete everything if something changed? If I needed to add or delete a tag in the future, can I not just add or delete a column, and amend my php query?

    What is "Toxi"? Where did that come from?

    I found it in my research here:
    http://tagging.pui.ch/post/370277457...tabase-schemas



    I would like to point out that if you do denormalize then there are several possible future queries that get tougher. Example:
    "How many bookmarks use both tag13 and tag93?"
    [WHERE (tag1=13 AND tag2=93) OR (tag1=93 AND tag2=13)]
    [ugh!]


    I'm misunderstanding why I would need that query? If a user can only search by one tag eg 13, wouldn't...

    WHERE BookmarkID = $bookmark
    AND Tag1 = 13 OR Tag2 = 13

    ...do the job?


    "How many bookmarks have only one tag?"
    [WHERE (tag1 IS NOT NULL AND tag2 IS NULL) OR (tag1 IS NULL AND tag2 IS NOT NULL]
    [double ugh!]


    ...I'm thinking that if only one tag is inserted, I would always make sure it's in `tag1`, so again, wouldn't a search like above for 13 do the job?


    "How many bookmarks use exactly the same two tags as another bookmark?"
    [much more code than I want to show]

    I'm unsure what this means - are you getting at the fact that there would be double ups of the same tag pairs, so therefore extra data? Wouldn't this be offset by the fact that a "Toxi" type system would introduce roughly the same amount of extra data.

    I'm not meaning to be argumentative at all - just trying to understand the pros and cons, and pick the right system to give me the FASTEST results when querying.

    I would go for the "Toxi" type system, only that if would introduce another join into my queries that already have have one or sometimes two, so I'm researching to see which would be faster - another join, or a denormalized tag system.

    To round my questions out, I have other tagging conditions to add:

    When searching, user gets to choose...

    - one tag from condition one eg 15 (bookmark has two tag columns for this)
    - one tag from condition two eg 19(bookmark has one tag column for this)
    - one tag from condition three eg 2 (bookmark has one tag column for this)

    (queries would be dynamically built to include or exclude ANDs based on what conditions user choose to include)

    So what would return faster results...?

    WHERE bookmarkID = 456
    AND tagCond1A = 15 OR tagCond1B = 15
    AND tagCond2 = 19
    AND tagCond3 = 2

    or a join query with the "Toxi" type tag system?

    Speed of results is key for me.

    Thanks!!!
    Last edited by shaunthomson; 02-07-2013 at 04:50 PM.

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,211
    Thanks
    80
    Thanked 4,571 Times in 4,535 Posts
    You know, a lot of this depends on how confident you are that you will *NEVER* want to do anything more than you are planning as of now.

    I tend to avoid locking myself into a design just because my experience has always been that the customer (or my boss...same thing, really) comes back 3 months (or 3 years) later and says, "You know, I really want to also do ..."

    And if I designed the DB right in the first place, I can say "trivial, let me just change this parameter." But if I did it wrong, then typically it means making changes to the entire DB schema. Which, even if it's not hard, can lead to hard-to-find errors weeks or months after then changes.

    You seem so supremely confident that your specifications will *NEVER* change that I would say it doesn't much matter which way you go. I do not think you will see a significant performance difference in this particular case between the fully normalized and the somewhat de-normalized versions. But the only way to really know is run benchmarks.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    shaunthomson (02-09-2013)

  • #5
    Master Coder felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, Australia
    Posts
    6,642
    Thanks
    0
    Thanked 649 Times in 639 Posts
    I would suggest that unless the benchmark tests indicate that the normalized version may have performance issues AND that the other version is SIGNIFICANTLY faster then there is no reason for undoing any of the normalizations and you should stick with the normalized version.
    Stephen
    Learn Modern JavaScript - http://javascriptexample.net/
    Helping others to solve their computer problem at http://www.felgall.com/

    Don't forget to start your JavaScript code with "use strict"; which makes it easier to find errors in your code.

  • Users who have thanked felgall for this post:

    shaunthomson (02-09-2013)

  • #6
    New Coder
    Join Date
    May 2012
    Posts
    89
    Thanks
    51
    Thanked 0 Times in 0 Posts
    OK.

    Dumb question - just to confirm, by normalized (in this case) you are refereing the the "Toxi" type system I included? ie bookmarks in one table, tags in another?

  • #7
    Master Coder felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, Australia
    Posts
    6,642
    Thanks
    0
    Thanked 649 Times in 639 Posts
    Never heard of the "Toxi" system - whatever it is has nothing to do with normalizing a database. Normalizing a database is called "Database Design".

    To normalize a database you need to design it so that:

    First normal form (1NF)
    Sets the very basic rules for an organized database:
    Eliminate duplicative columns from the same table.
    Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

    Your alternative breaks this first and most important rule.

    Second normal form (2NF)
    Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
    Create relationships between these new tables and their predecessors through the use of foreign keys.

    Third normal form (3NF)
    Remove columns that are not dependent upon the primary key.

    Boyce-Codd Normal Form (BCNF)
    Every determinant must be a candidate key.

    At this point you have taken care of functional dependencies - it is the minimal normalization that you should apply.

    Fourth normal form (4NF)
    Removes multi-value dependencies

    Fifth normal form (5NF)
    Every join dependency in the database is implied by the candidate keys

    At this point you have taken care of all join dependencies. This is usually as far as you need to go with normalization

    Sixth normal form or Domain/Key Normal Form(6NF/DKNF)
    decompose relation variables to irreducible components so that changes in values can be tracked over time
    Stephen
    Learn Modern JavaScript - http://javascriptexample.net/
    Helping others to solve their computer problem at http://www.felgall.com/

    Don't forget to start your JavaScript code with "use strict"; which makes it easier to find errors in your code.

  • Users who have thanked felgall for this post:

    shaunthomson (02-09-2013)

  • #8
    New Coder
    Join Date
    May 2012
    Posts
    89
    Thanks
    51
    Thanked 0 Times in 0 Posts
    OK - so this...



    ...seems to be 3NF, and mine is not even 1NF. I will try going with the example diagram. Unless you guys know of a better set up for tagging?

    Thanks for your help guys.

  • #9
    Master Coder felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, Australia
    Posts
    6,642
    Thanks
    0
    Thanked 649 Times in 639 Posts
    Are any of the fields in each table unique apart from the id? If so they you might make that the primary key and get rid of the id (unless the id has some meaning apart from providing a unique value to use as the key).
    Stephen
    Learn Modern JavaScript - http://javascriptexample.net/
    Helping others to solve their computer problem at http://www.felgall.com/

    Don't forget to start your JavaScript code with "use strict"; which makes it easier to find errors in your code.

  • Users who have thanked felgall for this post:

    shaunthomson (02-11-2013)

  • #10
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,211
    Thanks
    80
    Thanked 4,571 Times in 4,535 Posts
    I finally got around to looking at the URL you gave for the "Toxi" scheme.

    In case you didn't notice, "Toxi" is just the name (or handle) of a *PERSON* who submitted that scheme to the author of that page.

    The page is, quite frankly, ridiculous.

    Just for starters, he doesn't address the *HUGE* disadvantage that his "SQLlicious" scheme has. To wit: It CAN NOT take advantage of INDEXES. (It could do so using full text indexes, but those have problems of their own, such as finding matches where none exist, because they treat various forms of a word as the same as the main word.)

    It's essentially a useless scheme. ANY good database design book will warn you away from it.

    You seem enamored of the second scheme there, the "Scuttle" solution. I've already given my opinion of it. ANY CHANGES you may make in the future *WILL* require changes to your database schema. And to all the code you use to access the database. And... Again, if you really are 100% confident that you are never never going to change how things work, then this method could work.

    But I 100% agree with Felgall: The *ONLY* excuse for using this is if you find, via EXTENSIVE testing and benchmarking, that there are potential problems in the normalized scheme *AND* the unnormalized version performs significantly better. To me, here "significantly" would need to be at least one-third faster and I'd rather see at least one-half faster. Again, your tests and benchmarks should include variations in the indexes used. That is, make sure you are using the best indexes you can for MySQL.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    shaunthomson (02-11-2013)

  • #11
    New Coder
    Join Date
    May 2012
    Posts
    89
    Thanks
    51
    Thanked 0 Times in 0 Posts
    To be a pain...

    Is this an example of a normalised tagging system, or is it what you have been saying is not normalised...?



    Thanks again for your reply.

  • #12
    Master Coder felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, Australia
    Posts
    6,642
    Thanks
    0
    Thanked 649 Times in 639 Posts
    Quote Originally Posted by shaunthomson View Post
    Is this an example of a normalised tagging system, or is it what you have been saying is not normalised...?
    It is normalized but it may still contain unnecessary fields - depending on whether the name fields are unique within the tables and depending on the size of the name fields.

    If the name fields are unique and relatively small fields then none of the id fields are needed.

    I can't see the purpose for the id field on the tagmap table regardless as the other two fields together should be the primary key for that table.
    Stephen
    Learn Modern JavaScript - http://javascriptexample.net/
    Helping others to solve their computer problem at http://www.felgall.com/

    Don't forget to start your JavaScript code with "use strict"; which makes it easier to find errors in your code.

  • Users who have thanked felgall for this post:

    shaunthomson (02-11-2013)

  • #13
    New Coder
    Join Date
    May 2012
    Posts
    89
    Thanks
    51
    Thanked 0 Times in 0 Posts
    OK - that example was from the page I was researching

    Mine would be

    bookmark:
    userID
    itemID
    itemVersion

    tagmap:
    id
    bookmarkID
    tag_id

    tag:
    tag_id
    name

    all three columns in `bookmark` are together a unique key
    (itemVersion is for display data, it doesn't actually represent another item)

    So from what you are saying...

    - if the tag names are small enough, you could replace tag_id in tagmap with name (wouldn't this render the tag column useless?)
    - get rid of id in tagmap, and make bookmarkID/name the primary key in it

    @Old Pedant

    I trust your call, but am trying to get my head around how future changes would be more of a hassle with the "scuttle" solution?

    For example, to add in a new tag at a later date, wouldn't I need to update MORE tables?

    I'm a little confused, as, unless I'm misunderstand, Felgall's suggestion seem to revert the system back to a "scuttle" one, which is against what Old Pedant suggests.

    Thanks guys
    Last edited by shaunthomson; 02-11-2013 at 03:18 AM.

  • #14
    Master Coder felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, Australia
    Posts
    6,642
    Thanks
    0
    Thanked 649 Times in 639 Posts
    What I have been trying to say is that if the data already contains a relatively small field that contains a unique value for each record then you can use that for the primary key and not add an artificial ID field to use as the key. You should only add artificial fields as the key when there is either no natural key that can be used or the natural key is too long to be practical.

    A table can quite reasonably hold just one field where it is a list of those values that are valid in a field in another table.

    I don't think I have any id fields at all in any of the databases I have designed (I am aware of a few situations where they would be needed but have never actually designed any system that has the rare requirement of needing one).
    Stephen
    Learn Modern JavaScript - http://javascriptexample.net/
    Helping others to solve their computer problem at http://www.felgall.com/

    Don't forget to start your JavaScript code with "use strict"; which makes it easier to find errors in your code.

  • Users who have thanked felgall for this post:

    shaunthomson (02-11-2013)

  • #15
    New Coder
    Join Date
    May 2012
    Posts
    89
    Thanks
    51
    Thanked 0 Times in 0 Posts
    Thanks makes sense. Thanks Felgall.

    Can you use three columns that are unique together as the primary key, or is that not workable?

    If a table is just one column, how can it relate to any other tables - how do you create a relationship?


  •  
    Page 1 of 2 12 LastLast

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •