Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7
  1. #1
    Super Moderator JohnDubya's Avatar
    Join Date
    Nov 2006
    Location
    Missouri
    Posts
    634
    Thanks
    12
    Thanked 18 Times in 18 Posts

    LEFT JOIN 2 tables with multiple ON fields

    Ugh. Once again, I've gotten stumped on a complex query. I'll try to make it as simple as possible. Basically, we have a DB of people's information, and we allow for our clients to add custom categories to be able to add additional fields themselves. So for each custom category, there are multiple options to choose from. And each person can be associated with one of these options under each custom_field. Here is the DB structure:

    TABLES

    `custom_fields` => (field_id, name) [this table isn't used in the query, but here to show DB structure]

    `custom_options` => (option_id, field_id, value)

    `custom_attributes` => (option_id, person_id)

    `person` => (person_id, name)


    PROBLEM

    I need to filter on these things:

    • custom_options.field_id = 4
    • custom_options.option_id = custom_attributes.option_id
    • custom_attributes.person_id = p.person_id


    And in addition to needing to filter on these things, I also need the query to show people who do NOT have an option entry in `custom_attributes` for field_id = 4. So it would basically need to return NULL in that case but still show the record. Here's what I've got so far:

    PHP Code:
    $query "SELECT cuo1.value value FROM ( (person p) 
          LEFT JOIN custom_attributes cua1 ON cua1.person_id = p.person_id 
          LEFT JOIN custom_options cuo1 ON cuo1.option_id = cua1.option_id ) 
          WHERE ( (cuo1.field_id = 4 OR cuo1.field_id IS NULL) OR 
          (NOT EXISTS (SELECT 1 FROM custom_attributes tcua1 
          JOIN custom_options tcuo1 ON tcuo1.field_id = 4 
          WHERE tcua1.person_id = p.person_id AND tcua1.option_id = tcuo1.option_id)) ) 
          GROUP BY p.person_id ORDER BY p.name"

    This query correctly returns 14 rows, but the value is not completely correct. It returns the correct option value for people who only have a `custom_person` entry for an option under field_id = 4, but if they have additional `custom_person` entries, one of them is sometimes returned too.

    So how do I get this query to ONLY grab the option's value for a field_id = 4? Thanks for any help you can provide!
    Last edited by JohnDubya; 05-15-2009 at 04:56 PM. Reason: Resolved

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,211
    Thanks
    80
    Thanked 4,571 Times in 4,535 Posts
    Any time you use a LEFT JOIN and then use fields from the right-side table in the WHERE clause, you have effectively converted the query into an INNER JOIN.

    You *MUST* make selections that are dependent on right-side tables as part of the ON clause.
    Code:
    SELECT cuo1.value 
    FROM person p 
    LEFT JOIN custom_attributes cua1 
            ON cua1.person_id = p.person_id 
    LEFT JOIN custom_options cuo1 
            ON ( cuo1.option_id = cua1.option_id AND (cuo1.field_id = 4 OR cuo1.field_id IS NULL) )
             ... etc. ...
    I'm not sure I follow all the rest of that query, yet, so I'll let you play with it a while.

    But suffice to say you shouldn't be usieng ANY WHERE clause in there, except possibly in the inner SELECT.

  • Users who have thanked Old Pedant for this post:

    JohnDubya (05-15-2009)

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,211
    Thanks
    80
    Thanked 4,571 Times in 4,535 Posts
    You know, given *ONLY* the conditions you stated in the beginning of your post, it looks to me like this might do it:
    Code:
    SELECT p.person_id, p.name, cuo1.value 
    FROM person p 
    LEFT JOIN custom_attributes cua1 
            ON cua1.person_id = p.person_id 
    LEFT JOIN custom_options cuo1 
            ON ( cuo1.option_id = cua1.option_id AND cuo1.field_id = 4 )
    GROUP BY p.person_id, p.name 
    ORDER BY p.name
    Or, if you need more info from the person table, why not just
    Code:
    SELECT DISTINCT p.*, cuo1.value 
    FROM person p 
    LEFT JOIN custom_attributes cua1 
            ON cua1.person_id = p.person_id 
    LEFT JOIN custom_options cuo1 
            ON ( cuo1.option_id = cua1.option_id AND cuo1.field_id = 4 )
    ORDER BY p.name

  • Users who have thanked Old Pedant for this post:

    JohnDubya (05-15-2009)

  • #4
    Super Moderator JohnDubya's Avatar
    Join Date
    Nov 2006
    Location
    Missouri
    Posts
    634
    Thanks
    12
    Thanked 18 Times in 18 Posts
    Thanks so much for the responses, Old Pedant!

    In addition to your suggestion of the two ON statements, I added more in the WHERE clause. This is what ended up working for me:

    Code:
    SELECT p.person_id, p.name, cuo1.value 
    FROM ( (person p) 
    LEFT JOIN custom_attributes cua1 
            ON cua1.person_id = p.person_id 
    LEFT JOIN custom_options cuo1 
            ON ( cuo1.option_id = cua1.option_id AND cuo1.field_id = 4 ) )
    WHERE ( (cuo1.field_id = 4 OR cuo1.field_id IS NOT NULL) OR 
            (NOT EXISTS (SELECT 1 FROM custom_attributes tcua1 
            JOIN custom_options tcuo1 ON tcuo1.field_id = 4 
            WHERE tcua1.person_id = p.person_id AND tcua1.option_id = tcuo1.option_id)) ) 
    GROUP BY p.person_id, p.name 
    ORDER BY p.name
    That whole NOT EXISTS part searches for people who do not have entries in the custom_attributes table for field_id = 4. I got that code from a separate post and customized it for my query, so I'm assuming that's all it does. lol
    Last edited by JohnDubya; 05-15-2009 at 05:06 PM.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,211
    Thanks
    80
    Thanked 4,571 Times in 4,535 Posts
    Okay...now I see it. Makes sense.

    I wrote up a message a while back explaining why using WHERE on dependent table fields can convert outer joins to inner joins. I you would like to see it, I'll see if I can dig it up.

  • #6
    Super Moderator JohnDubya's Avatar
    Join Date
    Nov 2006
    Location
    Missouri
    Posts
    634
    Thanks
    12
    Thanked 18 Times in 18 Posts
    Absolutely. This is about fourth time I've run into an issue similar to this one, so the more I learn about it, the easier it will be next time. This stuff makes my head spin. lol Thanks again!

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    27,211
    Thanks
    80
    Thanked 4,571 Times in 4,535 Posts
    When you use a dependent table in the WHERE clause after doing any kind of OUTER JOIN, you automatically effectively convert the outer join to an INNER JOIN.

    I posted this example WHY that happens way back in 2004 and it's so hard to find, now, that I just spent 15 minutes looking for it. So I'm gonna copy it here and save a link to it.

    ******************************************
    Let's assume we have the following data in two tables:
    Code:
    table: Cities
    id -- city
     1 -- LAX
     2 -- NYC
     3 -- SFO
     4 -- SEA
    
    table: Banks
    cityID -- HQ   -- bank
         2 -- True -- CitiBank
         3 -- True -- Bank of America
         3 -- False -- California Bank
    And then look at what happens when we use this query:
    Code:
    QUERY:
    SELECT C.city, B.bank
    FROM cities AS C LEFT JOIN banks AS B
    ON ( C.id = B.cityID )
    WHERE B.HQ = True
    First, do the query and get all the records *WITHOUT THE WHERE* clause.
    Thus:
    Code:
      id - city- cityID - HQ   - bank
    a) 1 - LAX - null   - null - null
    b) 2 - NYC - 2      - True - CitiBank
    c) 3 - SFO - 3      - True - Bank of America
    d) 3 - SFO - 3      - False- California Bank
    e) 4 - SEA - null   - null - null
    *NOW* apply the WHERE clause to that:
    ... WHERE HQ = True

    Well, for record "d", the right thing is done: The FALSE in HQ rejects that record.

    But now look at records "a" and "e". Is the HQ field in those records TRUE? Nope. It's NULL.

    SO THOSE RECORDS ALSO GET REJECTED!!!!

    So the final results will be:
    Code:
    NYC - Citibank
    SFO - Bank of America
    **************

    Now, compare that to this query:
    Code:
    SELECT C.city, B.bank
    FROM cities AS C LEFT JOIN banks AS B
    ON ( C.id = B.cityID AND B.HQ = True )
    Since the test on HQ is now part of the ON, it doesn't affect whether or not a "cities" record will be part of the result.

    And, indeed, you will correctly get final results of
    Code:
    LAX - null
    NYC - Citibank
    SFO - Bank of America
    SEA - null
    **************

    TRY IT!

    A comment I left out of the original post: *IF* the test you make in the WHERE clause checks ONLY for NULL, then it's okay! That is, if you use
    ... WHERE DependentTable.SomeField IS NULL
    (or IS NOT NULL, of course).

    You can see why, I assume.

  • The Following 2 Users Say Thank You to Old Pedant For This Useful Post:

    bazz (05-31-2009), JohnDubya (05-18-2009)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •