Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 12 of 12
  1. #1
    New to the CF scene
    Join Date
    Oct 2012
    Location
    Las Vegas, NV, USA
    Posts
    4
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Issue with DISTINCT and ORDER BY with UNION

    I have been working on this Query to pull data from 3 different tables and use them to determine which rows to pull from another table and in what order. The script I have returns everything just fine, except it has duplicates. I tried using DISTINCT, but by doing so it removes the ORDER BY functionality.

    Here are the 3 tables:
    Code:
    posts
    -----
    roomId => int(11)
    postId => int(11)
    uniqueId => varchar(32)
    title => varchar(255)
    post => text
    image => varchar(100)
    createdBy => int(11)
    createdOn => int(11)
    editedBy => int(11)
    editedOn => int(11)
    deleted => int(1)
    
    comments
    -----
    postId => int(11)
    commentId => int(11)
    comment => text
    image => varchar(100)
    createdBy => int(11)
    createdOn => int(11)
    editedBy => int(11)
    editedOn => int(11)
    deleted => int(1)
    
    replies
    -----
    commentId => int(11)
    replyId => int(11)
    comment => text
    image => varchar(100)
    createdBy => int(11)
    createdOn => int(11)
    editedBy => int(11)
    editedOn => int(11)
    deleted => int(1)
    Here is the SQL I wrote:
    Code:
    SELECT DISTINCT(p.postId), p.roomId, p.uniqueId, p.title, p.post, p.image, p.createdOn, r.created
    FROM posts p
    INNER JOIN (
      SELECT postId, created FROM (
        (
          SELECT p.postId, p.createdOn AS created
          FROM posts p
          WHERE p.deleted = 0
        )
        UNION
        (
          SELECT p.postId, c.createdOn AS created
          FROM comments c
          LEFT JOIN posts p
          ON p.postId = c.postId
          WHERE p.deleted = 0
          AND c.deleted = 0
        )
        UNION
        (
          SELECT p.postId, r.createdOn AS created
          FROM replies r
          LEFT JOIN comments c
          ON c.commentId = r.commentId
          LEFT JOIN posts p
          ON p.postId = c.postId
          WHERE p.deleted = 0
          AND c.deleted = 0
          AND r.deleted = 0
        )
      ) AS everything
    ) r
    ON p.postId = r.postId
    WHERE p.deleted = 0
    ORDER BY r.created DESC
    Should look like:
    Code:
    3 | 1 | x1b2cxd4e5f6y7h8i9j0k102m3nzo5p6 | This is a test Post 2 | The path of the righteous man is beset on all side... | 1348041600 | 1348041602
    2 | 1 | a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6 | This is a test Post   | The path of the righteous man is beset on all side... | 1348041600 | 1348041601
    4 | 1 | pq72cxd4e5f6y7igbej0k102m3nzo5pt | This is a test Post 3 | The path of the righteous man is beset on all side... | 1348041600 | 1348041600
    But returns like this:
    Code:
    2 | 1 | a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6 | This is a test Post   | The path of the righteous man is beset on all side... | 1348041600 | 1348041600
    3 | 1 | x1b2cxd4e5f6y7h8i9j0k102m3nzo5p6 | This is a test Post 2 | The path of the righteous man is beset on all side... | 1348041600 | 1348041600
    4 | 1 | pq72cxd4e5f6y7igbej0k102m3nzo5pt | This is a test Post 3 | The path of the righteous man is beset on all side... | 1348041600 | 1348041600
    This is all just sample data. But the concept should still be the same. If anybody has any ideas please let me know.

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Can you show the result of doing *JUST* the UNION part of that?

    And you only show 3 records in the "returns like this". I assume it is actually returning many more?

    And you apparently misunderstand how SELECT DISTINCT works.

    It applies to *ALL* the fields in your SELECT. If *ANY* fields in the two records are different, then the records are different and the DISTINCT will show them.

    Putting parentheses around the field name, as you did with
    Code:
    SELECT DISTINCT(p.postId), p.roomId, ...
    is meaningless. It changes nothing. Remove them and you will get identical results.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    New to the CF scene
    Join Date
    Oct 2012
    Location
    Las Vegas, NV, USA
    Posts
    4
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Old Pedant View Post
    Can you show the result of doing *JUST* the UNION part of that?
    Here is the output that I get:
    Code:
    2 | 1348041600
    3 | 1348041600
    4 | 1348041600
    2 | 1348041601
    3 | 1348041602
    Quote Originally Posted by Old Pedant View Post
    And you only show 3 records in the "returns like this". I assume it is actually returning many more?
    Right now there is very limited data in the system for basic testing.

    Quote Originally Posted by Old Pedant View Post
    And you apparently misunderstand how SELECT DISTINCT works.

    It applies to *ALL* the fields in your SELECT. If *ANY* fields in the two records are different, then the records are different and the DISTINCT will show them.

    Putting parentheses around the field name, as you did with
    Code:
    SELECT DISTINCT(p.postId), p.roomId, ...
    is meaningless. It changes nothing. Remove them and you will get identical results.
    The only reason I put parentheses around the field name was because I saw it in a tutorial and was trying it out. But like you said, it returns the same as without the parentheses.

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Quote Originally Posted by TGeene View Post
    The only reason I put parentheses around the field name was because I saw it in a tutorial and was trying it out. But like you said, it returns the same as without the parentheses.
    Well, that's one tutorial you can mark off your places to look at for help, I would say.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Quote Originally Posted by TGeene View Post
    Here is the output that I get:
    Code:
    2 | 1348041600
    3 | 1348041600
    4 | 1348041600
    2 | 1348041601
    3 | 1348041602
    So that makes sense.

    But then how come you only got 3 records of output from the entire query? You should have gotten 5.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    New to the CF scene
    Join Date
    Oct 2012
    Location
    Las Vegas, NV, USA
    Posts
    4
    Thanks
    1
    Thanked 0 Times in 0 Posts
    I am trying to output only one instance of any one postId. That is where the DISTINCT comes in. But whenever I add a DISTINCT to the query it completely ignores the ORDER BY function.

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    I assume you know that doing UNION alone is that same as doing SELECT DISTINCT?

    If you want *all* results from a UNION, you must use UNION ALL.

    Not clear to me why you have the extra level of SELECT in there.

    Try this variation on your query. It might prove instructional/helpful:
    Code:
    SELECT p.postId, p.roomId, p.uniqueId, p.title, p.post, p.image, p.createdOn, r.created, r.what
    FROM posts p
    INNER JOIN (
        SELECT 'posts' AS what, p.postId, p.createdOn AS created
        FROM posts p
        WHERE p.deleted = 0
            UNION ALL
        SELECT 'comments', p.postId, c.createdOn 
        FROM comments c
        LEFT JOIN posts p
        ON p.postId = c.postId
        WHERE p.deleted = 0
        AND c.deleted = 0
            UNION ALL
        SELECT 'replies', p.postId, r.createdOn
        FROM replies r
        LEFT JOIN comments c
        ON c.commentId = r.commentId
        LEFT JOIN posts p
        ON p.postId = c.postId
        WHERE p.deleted = 0
        AND c.deleted = 0
        AND r.deleted = 0
        ) r
    ON p.postId = r.postId
    WHERE p.deleted = 0
    ORDER BY r.created DESC
    (So long as I have the "what" field in there, the UNION ALL isn't needed, since you can't get duplicates, but it won't hurt and you may [or may not...depends on what you want] need it when/if you pull out the "what" field.)
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #8
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Quote Originally Posted by TGeene View Post
    I am trying to output only one instance of any one postId. That is where the DISTINCT comes in. But whenever I add a DISTINCT to the query it completely ignores the ORDER BY function.
    AHH! Okay...ignore my just prior post. I have an idea.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #9
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Try this...
    Code:
    SELECT p.postId, p.roomId, p.uniqueId, p.title, p.post, p.image, p.createdOn, r.created
    FROM posts p
    INNER JOIN (
       SELECT u.postid, MAX(u.createdOn) AS created
       FROM (
         SELECT p.postId, p.createdOn
         FROM posts p
         WHERE p.deleted = 0
             UNION 
         SELECT p.postId, c.createdOn 
         FROM comments c
         LEFT JOIN posts p
         ON p.postId = c.postId
         WHERE p.deleted = 0
         AND c.deleted = 0
             UNION
         SELECT p.postId, r.createdOn
         FROM replies r
         LEFT JOIN comments c
         ON c.commentId = r.commentId
         LEFT JOIN posts p
         ON p.postId = c.postId
         WHERE p.deleted = 0
         AND c.deleted = 0
         AND r.deleted = 0
       ) AS u
       GROUP BY u.postid
    ) AS r
    ON p.postId = r.postId
    WHERE p.deleted = 0
    ORDER BY r.created DESC
    I think that's right. Note how using MAX( ) ensures that, indeed, you will get only one "created" date/time for each postid.

    If I have the sense of it wrong, you might want to change MAX( ) to MIN( ).
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    TGeene (10-09-2012)

  • #10
    New to the CF scene
    Join Date
    Oct 2012
    Location
    Las Vegas, NV, USA
    Posts
    4
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Your query returns the following:
    Code:
    3 | 1 | x1b2cxd4e5f6y7h8i9j0k102m3nzo5p6 | This is a test Post 2 | The path of the righteous man is beset on all side... | 1348041600 | 1348041602 | replies
    2 | 1 | a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6 | This is a test Post | The path of the righteous man is beset on all side... | 1348041600 | 1348041601 | comments
    2 | 1 | a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6 | This is a test Post | The path of the righteous man is beset on all side... | 1348041600 | 1348041600 | posts
    3 | 1 | x1b2cxd4e5f6y7h8i9j0k102m3nzo5p6 | This is a test Post 2 | The path of the righteous man is beset on all side... | 1348041600 | 1348041600 | posts
    4 | 1 | pq72cxd4e5f6y7igbej0k102m3nzo5pt | This is a test Post 3 | The path of the righteous man is beset on all side... | 1348041600 | 1348041600 | posts
    3 | 1 | x1b2cxd4e5f6y7h8i9j0k102m3nzo5p6 | This is a test Post 2 | The path of the righteous man is beset on all side... | 1348041600 | 1348041600 | comments
    3 | 1 | x1b2cxd4e5f6y7h8i9j0k102m3nzo5p6 | This is a test Post 2 | The path of the righteous man is beset on all side... | 1348041600 | 1348041600 | replies
    This is an improvement on the code, but it is still returning duplicate postId's. Which is one of the main issues I am having.

    (P.S. This is my first time working with UNION, so I had no idea UNION ALL existed.)

  • #11
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    But I think MAX( ) is right. You want the most recent post/comment/reply for each original post, yes?
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #12
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Part of your problem is your data: You have both a comment and a reply for post #3 that happen at exactly the same time.

    But I think my MAX( ) trick will take care of this, regardless.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •