Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New to the CF scene
    Join Date
    Jul 2010
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Union of 2 match queries and ordering by relevance of both

    Hi,

    I'm trying to query two unrelated tables (but retrieve related date). I've done this previously as 2 separate queries, but I'm trying to reduce the number of queries to increase performance.

    The 2 queries I use, are match queries (in Boolean Mode). I'm not an expert, but I'm trying to get this to perform as well as it can, and have read Match & Boolean Mode for searching over text columns is probably what I should be using. The problem I'm having is that I can order by one score but not the other:

    Code:
    SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4', 
      MATCH(TAG,SUM,Title) 
      AGAINST ('Landlord') as RelevanceA
      FROM rft_searchlawssum 
      WHERE MATCH (TAG,SUM,Title) AGAINST('Landlord' IN BOOLEAN MODE)
    UNION ALL
    SELECT g.URL as gURL, p.message AS pmessage, p.subject as psubject, g.idtype as gidtype, g.id as gid, p.pid as ppid, p.tid as ptid,   
      MATCH(p.message,p.subject) AGAINST ('Landlord') as RelevanceB
      FROM mybb_google_seo g LEFT JOIN mybb_posts p ON g.id = p.tid 
      WHERE MATCH(p.message,p.subject) 
      AGAINST('Landlord' IN BOOLEAN MODE) AND g.idtype ='4' 
      ORDER BY RelevanceA, RelavanceB DESC
    If I drop the RelavanceB this works, but with the RelavanceB I get:
    #1054 - Unknown column 'RelevanceB' in 'order clause'

    Any idea why I cant use RelavanceB in my order clause?

  • #2
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Same reason you can only use the field *NAMES* from the FIRST SELECT in a UNION.

    Go on try it.

    Try to use pmessage or psubject in your PHP (or whatever) code that calls this query.

    Or, for that matter, try it in MySQL, alone.

    Just do:
    Code:
    SELECT pmessage 
    FROM (
         SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4' ...
         UNION ALL
         SELECT g.URL as gURL, p.message AS pmessage, p.subject as psubject, ...
    )
    Won't happen.

    Or just execute that UNION ALL query, alone, from the MySQL command line.

    Look at the names of the fields you get.

    Using AS to alias the names of fields in the second and subsequent SELECTs of a union is a waste of code (well, unless you use them in a HAVING clause...but even then the safer thing to do is alias them the same as the names of the first SELECT).

    In other words, what you probably should do (if you insist on using aliases in the second SELECT) is
    Code:
    SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4', 
      MATCH(TAG,SUM,Title) 
      AGAINST ('Landlord') as RelevanceA
      FROM rft_searchlawssum 
      WHERE MATCH (TAG,SUM,Title) AGAINST('Landlord' IN BOOLEAN MODE)
    UNION ALL
    SELECT g.URL, p.message AS SUM, p.subject as Title, g.idtype as dummy1, g.id as dummy2, p.pid as dummy3, p.tid as dummy4,   
      MATCH(p.message,p.subject) AGAINST ('Landlord') as RelevanceA
      FROM mybb_google_seo g LEFT JOIN mybb_posts p ON g.id = p.tid 
      WHERE MATCH(p.message,p.subject) 
      AGAINST('Landlord' IN BOOLEAN MODE) AND g.idtype ='4' 
      ORDER BY RelevanceA
    Or choose better names in your first SELECT.

    [And when you don't give a name to a field in a SELECT, MySQL uses the value as the name. Which is why 'dummy1' gets the name dummy1, etc.]
    Last edited by Old Pedant; 05-12-2011 at 12:21 AM.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #3
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    And by the way, the field types in the two SELECT lists must agree, in order.

    If SUM is a numeric field, as seems likely, and message is some kind of text field, then most DBs would reject the UNION with a type mismatch.

    MySQL allows it (I just tested, to verify), but the resulting field type of SUM will be the text type of message, since text can't be converted to a number but of course a number can be converted to a string.

    If you *need* to be able to treat the SUM field as a number, then you would be better off UNIONing it with a dummy numeric field from the second SELECT.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #4
    New to the CF scene
    Join Date
    Jul 2010
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    I see what you mean, I only get the 1st field names when running in phpMyAdmin

    I matched up the types (Sum is Summary, so its the same as p.message/ post Message, and the same column type)

    I'm going to switch the top and bottom quires, since I need the gid, ppid ... aliases.

    so it will be:

    Code:
    SELECT g.URL as gURL, p.message AS pmessage, p.subject as psubject, g.idtype as gidtype, g.id as gid, p.pid as ppid, p.tid as ptid,   
      MATCH(p.message,p.subject) AGAINST ('Landlord') as RelevanceB
      FROM mybb_google_seo g LEFT JOIN mybb_posts p ON g.id = p.tid 
      WHERE MATCH(p.message,p.subject) 
      AGAINST('Landlord' IN BOOLEAN MODE) AND g.idtype ='4' 
    UNION ALL
    SELECT URL, SUM, Title, 'dummy1', 'dummy2', 'dummy3', 'dummy4', 
      MATCH(TAG,SUM,Title) 
      AGAINST ('Landlord') as RelevanceA
      FROM rft_searchlawssum 
      WHERE MATCH (TAG,SUM,Title) AGAINST('Landlord' IN BOOLEAN MODE)
    ORDER BY RelevanceB DESC
    This returns better column names
    But I'm in the same situation, ordering by RelevanceA is not possible.. (with my limited understanding, anyway)
    Last edited by mike22; 05-12-2011 at 01:33 AM.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    Ummm...yes, it is.

    Because RelevanceB is a field in *BOTH* SELECTS, by virtue of the fact that it *IS* a field in the first SELECT and so the corresponding field in the other SELECT will get that same name, the ORDER BY will apply to the COMBINED UNION.

    Now...

    The problem you may have is that the relevance in one of those SELECTs may not be comparable to the relevance in the other.

    So maybe you'll have to apply a multiplier to one of them to get them into the same range of values.

    But you should be able to see that by simply inspecting the results.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    mike22 (05-12-2011)

  • #6
    New to the CF scene
    Join Date
    Jul 2010
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Ah ha, that does make sense.

    From your last example/explanation, I suspected it might have already been ordering by both, I just couldnt tell.

    For some reason, I'm getting results, but yet the relevance score is 0 ???
    (surely if it was 0, the data shouldn't even be pulled in from the Boolean match)

    The word 'Landlord' is sometimes in the url, sometimes in title and sometimes in the summary (and sometimes in 2 or all 3). From what I've read, a 0 should only happen if its not present in any, shouldn't it?

    It finds the right result, but all of the relevance scores are 0 (so doesn't ever order them 'right', before considering adding a co-efficient to make the scores relative)
    Last edited by mike22; 05-12-2011 at 02:44 AM.

  • #7
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,857
    Thanks
    78
    Thanked 4,417 Times in 4,382 Posts
    What happens if you run each SELECT separately? Do you see any relevance value numbers then?
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #8
    New to the CF scene
    Join Date
    Jul 2010
    Posts
    9
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Now I see my issue, since its a new project, it' currently a small database

    More than 50% of the current rows contain the word "Landlord". I found this mentioned about fullText searching:

    if a search word is found in more than 50% of the
    records of the table, then the word is thrown out as a factor in the
    weighting
    -I've since inserted a few dummy rows, and I do indeed get values back for RelevanceB (for results of the first and second query!). I hadn't investigated the issue enough, Old Pedant you certainly walked through it with me..

    I hope this thread helps anyone else with a similar issue. Thank you Old Pedant


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •