Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 12 of 12
  1. #1
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,537
    Thanks
    45
    Thanked 259 Times in 256 Posts

    Could use advice/thoughts on bidirectional relationship

    I was thinking about this earlier, and I'm not quite sure how its done, or what the thought process behind it is, but...

    If you have a bidirectional relationship, such as facebook friends, whats the mindset behind creating the table to represent that? My initial thought was a table with basically userID and friendID, where both are foreign keys pointing to a user table. But then I realized if you did that, you'd either have to create two entries representing the relationship in both directions, which would be redundant data, or you'd have to write the query to search both columns.

    Is there another technique I don't know of? Am I on the right path? Is there another reasoning I don't know?

  • #2
    Regular Coder Apothem's Avatar
    Join Date
    Mar 2008
    Posts
    380
    Thanks
    36
    Thanked 25 Times in 25 Posts
    I believe one thing I saw a forum do in their MySQL structure is have a "friendid" column that contains a list of user ids that are friends of the user (separated by some deliminator, such as "," or " "). You would just load the user row, get the entry, split the list of ids, and do whatever you want with them. Of course, this isn't literally bidirectional (as your solution isn't either), but you can make it seem bidirectional by making sure each added friend has the user's id in their "friendid" column.
    Last edited by Apothem; 12-29-2011 at 07:07 PM.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,561
    Thanks
    80
    Thanked 4,492 Times in 4,456 Posts
    Oh, PLEASE do not even *THINK* of using Apothem's "solution"!!!

    That's the worst advice you could get.

    Sorry, but true. *NEVER* store a delimited list in a single field in *ANY* relational database.

    Just for starters (and I mean that), how would you then use SQL to answer a simple question such as "How many users have 'John Doe' in their FRIENDS list?"

    Oh, you can code it, but it will be a really ugly query and it will be slow, as there is no way to take advantage of any indexing.

    But now give a shot at coding "How many users have 'John Doe' in their FRIENDS list and, in addition, have at least 4 other friends in common?"

    We'll wait for you to give us an answer sometime in 2013.

    BLECH.
    Last edited by Old Pedant; 12-29-2011 at 07:59 PM.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,561
    Thanks
    80
    Thanked 4,492 Times in 4,456 Posts
    The right answer is the usual one: NORMALIZATION.

    A simple MANY-TO-MANY table:
    Code:
    CREATE TABLE friends (
        userid INT REFERENCES users(userid),
        friendid INT REFERENCES users(userid),
        PRIMARY KEY (userid,friendid)
    );
    Whether you store "John is a friend of Mary" as one row in that table or as two (that is, so you also have "Mary is a friend of John") is up to you. The extra space needed is miniscule (8 bytes or so, per friend link) and it does simplify queries. But it's really no big deal to join to the friends table via both fields:
    Code:
    SELECT username
    FROM users, friends 
    WHERE users.userid IN ( friends.userid, friends.friendid )
    If you do that, you would surely want to also have a separate (non-unique) index on the friendid field. And that could quite possibly eat up most of the space advantage of not storng the friend relationship both ways.

    Anyway, one way or both ways is not a really important decision. You could always add the other way later if you saw more reasons for it. But either one is orders of magnitude better than storing a delimited list.

    And, by the by, answering a question such as ""How many users have 'John Doe' in their FRIENDS list and, in addition, have at least 4 other friends in common?" is really trivial now, especially if you store both directions of the friend relationship (but only a tiny bit more complex if you don't).
    Last edited by Old Pedant; 12-29-2011 at 08:00 PM.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Keleth (12-31-2011)

  • #5
    Regular Coder Apothem's Avatar
    Join Date
    Mar 2008
    Posts
    380
    Thanks
    36
    Thanked 25 Times in 25 Posts
    "How many users have 'John Doe' in their FRIENDS list?"
    Well we know the OP wants a bidirectional relationship between two users. If you suppose that he implements it correctly, the total list of user IDs in John Doe's freindid is the total people who have John Doe in their friend list. Though the process of getting to that point may be (MUCH) longer than what you have said.

    As for the other question, that's a really good question. I have absolutely no idea. Such a feature would definitely, at least from my knowledge, take many queries.

    Though to my inexperienced mind your solution and OP's initial solution are fairly similar. I have not played with many functionality that InnoDB (I only played with MyISAM) provides, so I am just ignorant of how much more power things like references bring. What does references really do? What is its benefits?

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,561
    Thanks
    80
    Thanked 4,492 Times in 4,456 Posts
    It's not just the REFERENCES capability that is in question here.

    You can use REFERENCES with MyISAM, but MyISAM won't enforce the qualities as does INNODB. INNODB, along with most any decent database engine (even Microsoft's "toy" database Access does this) *enforces* referential integrity. That means that if you try, in this example, to add a friendid of 9999 and there is not userid in the users table with that value, then you get an error. Similarly, if you tried to delete a record from the users table that had a userid that is in use as a foreign key (REFERENCES, in your words) in another table, you get an error and the delete is disallowed.

    All foreign keys must refer to existing primary keys at all times.

    On top of that, you can optionally add characteristics to foreign keys. For example, CASCADE DELETE. When that is declared on a foreign key (REFERENCES link), if the record with the primary key is deleted, then the *ALL* records with the matching foreign key are *automatically* also deleted. Referential integrity is thus still enforced. (There is also a CASCADE UPDATE option which is much less used.)

    ************

    But, again, it's not the referential integrity that is the important part in this case. Or at least not the most important part.

    In *ANY* database, NORMALIZATION is the most important part.

    If you aren't familiar with it, GOOGLE is your friend.

    FWIW, to find out all users that 'John Doe' is a friend of using your scheme, you would need to code something like this:
    Code:
    SELECT * FROM friends 
    WHERE CONCAT(',', listOfFriends, ',') LIKE CONCAT( '%,', $johns_userid, ',%' )
    (That assumes that you use comma for the delimiter in the list...use any non digit character if you wish.)

    First of all LIKE is about the slowest operation that MySQL (or any DB) can do. It's complex and requires character-by-character scanning of fields. In the case of MySQL, it also can *NOT* be perfomed using an index. MySQL will only do it by scanning every single record in the given table.

    Compare that to this:
    Code:
    SELECT u.username
    FROM users AS u, friends AS f
    WHERE u.userid = f.userid
    AND f.friendid = $johns_userid
    Assuming that the field friends.friendid is indexed, that query is as efficient as it gets in any database.

    NOTE: Actually, the code equivalent to the version using the list of friends would simply be
    Code:
    SELECT * FROM friends
    WHERE friendid = $johns_userid
    Last edited by Old Pedant; 12-29-2011 at 09:01 PM.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Apothem (12-29-2011)

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,561
    Thanks
    80
    Thanked 4,492 Times in 4,456 Posts
    "How many users have 'John Doe' in their FRIENDS list and, in addition, have at least 4 other friends in common?"

    Code:
    SELECT U1.username AS user1, 
           U2.username AS user2, 
           COUNT(*) AS mutualFriends,
           SUM( IF(U1.userid=$johns_userid,1,0) ) AS johnIsFriend
    FROM users AS U1, users AS U2, friends AS F1, friends AS F2
    WHERE U1.userid = F1.userid
      AND U2.userid = F2.userid
      AND U1.friendid = U2.friendid
    GROUP BY U1.username, U2.username
    HAVING johnIsFriend > 0 AND mutualFriends >= 5
    Untested, but I think that's right. And really pretty simple *AND* should be quite fast.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #8
    Regular Coder Apothem's Avatar
    Join Date
    Mar 2008
    Posts
    380
    Thanks
    36
    Thanked 25 Times in 25 Posts
    I see. Very interesting. When I continue to program a project that uses a database, I'll definitely look up normalization for the type of database I will be working on.

    ----------------

    What I was thinking of (i.e. "my method") was something like this:
    Code:
    SELECT friendid FROM users WHERE userid = $uid
    Then lets say you're using PHP and you retrieved the result, and that friendid is not null:
    Code:
    $results = <get cell> // I can't remember it off the top of my head
    $friends = explode(",", $results);
    sizeof($friends); // total friends
    Just a single query; fast and efficient.

    Suppose you also want to get all the names like you have:
    Code:
    $list = implode("','", $friends);
    Code:
    SELECT username FROM users WHERE userid IN('{$list}')
    If userid is index, as they normally are, it would still be relatively fast right?

    Of course you can immediately see that my solution adds a layer of indirection, while yours does not.

    Edit: To be quite blatant, my mind cannot break down your second (longer) query :P
    Last edited by Apothem; 12-29-2011 at 09:10 PM.

  • #9
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,561
    Thanks
    80
    Thanked 4,492 Times in 4,456 Posts
    Your solution requires that the *ENTIRE* users table (well, at least one field in each record) be copied from MySQL to PHP where you would then do the filtering in PHP code.

    If you have a miniscule number of records, and your site is not very busy, that's an okay solution.

    But if you had a million (or 100 million!) users, a la Facebook, that solution would be disastrous.

    When at all possible, filtering of records should be done in the database, not in PHP (or any other database client).

    Remember, the database engine runs in a DIFFERENT PROCESS than the web server. And MySQL could even run on a different *computer* than the web server--almost always true in shared hosting systems [and I know it's true with GoDaddy, for example].

    So for the web server (which includes PHP) to get data from the database, it has to *COPY* the data *FROM THE OTHER PROCESS OR COMPUTER* into the process space of the web server.

    When the two are on the same machine, this will usually be done via pipes or shared memory. Reasonably efficient, but it still involved the DB engine putting the data from its internal buffers into one end of the pipe (or shared memory) and then PHP pulling the data from the pipe/shared memory and putting it into PHP's internal buffers. That is *TWO* EXTRA operations on every byte of data. When the two servers are on different machines, the data has to actually be sent across a *wire*, which (by the nature of inter-machine operations) could entail another two extra copy operations, not to mention the delay in converting bytes to bits to send them and from bits to bytes to receive them. INTER PROCESS COMMUNICATIONS are expensive and should be kept to a minimum.

    Little of this matters a whole lot in toy websites. If you don't have more than, say, 1000 web hits per day then code things any way you wish. But when you get up to hundreds of hits per minute, you'll be sorry if you didn't pay attention to the bits and bytes.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Apothem (12-29-2011)

  • #10
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,561
    Thanks
    80
    Thanked 4,492 Times in 4,456 Posts
    Also, this solution is wrong:
    Code:
    $list = implode("','", $friends);
    ...
    SELECT username FROM users WHERE userid IN('{$list}')
    Oh, it will work. But assuming userid is an INT field, as it should be, then the apostrophes are unneeded *AND* they will slow up MySQL (admittedly by a tiny amount) as it converts each string to a number before comparing to the int value of userid.

    If you must use that solution, just use:
    Code:
    $list = implode(",", $friends);
    
    SELECT username FROM users WHERE userid IN({$list})
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Apothem (12-29-2011)

  • #11
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,537
    Thanks
    45
    Thanked 259 Times in 256 Posts
    As always Old Penant, you are a source of wisdom.

    Ok, so whether I store one relationship and join on both columns or create both relationships really just falls down to personal taste and how else it interacts with my code, if I understand you correctly. In that case, I was on somewhat the right track. Thanks as always.

  • #12
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,561
    Thanks
    80
    Thanked 4,492 Times in 4,456 Posts
    I personally feel that storing both directions will be easier and probably perform better in the long run, but I doubt that it matters much until you get up into the many thousands of users.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •