Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    Regular Coder
    Join Date
    May 2009
    Location
    Moore, OK
    Posts
    282
    Thanks
    11
    Thanked 41 Times in 41 Posts

    Cool Psuedo Full Text Search on MySQL InnoDB Tables

    I have no idea how efficient this is, but it is something I've thought might help others who want to perform a full text search in MySQL on a innoDB table. It basically makes use of the if statement with the length and locate functions.

    NOTE: I have not done a lot of testing using this as I just thought it up. If you can improve on it please let me know.

    So here we go!

    First we have our test table we are going to search on:

    Code:
    CREATE TABLE IF NOT EXISTS `Sites` (
      `site_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
      `site_name` varchar(255) NOT NULL DEFAULT '',
      `site_domain` varchar(255) NOT NULL DEFAULT '',
      PRIMARY KEY (`site_id`),
      UNIQUE KEY `u_site_domain` (`site_domain`),
      KEY `i_site_name` (`site_name`)
    ) ENGINE=InnoDB;
    Fill it with a few examples:

    Code:
    INSERT INTO `Sites` (`site_name`, `site_domain`) VALUES
    ('Dev', 'dev'),
    ('Test', 'test'),
    ('Test 2', 'test-2.org'),
    ('Coyote6 GraphX', 'coyote6.com'),
    ('MySQL', 'mysql.com'),
    ('PHP', 'php.net');
    Now for our search query (you can add limit and order by to it if you want):

    Code:
    SELECT `site_id` as `id`, `site_domain` as `domain`, `site_name` as `name` 
    FROM `Sites` 
    WHERE IF(LENGTH(:n) > 0, LOCATE(:n, `site_name`) > 0, TRUE) && IF(LENGTH(:d) > 0, LOCATE(:d, `site_domain`) > 0, TRUE)
    The :n & :d are our place holders for the PHP insert.
    • :n for the site name to search for
    • :d for the domain to search for

    Now say we want to search for a .org site. We will substitute the :d for '.org' and leave the name a blank string.

    Code:
    SELECT `site_id` as `id`, `site_domain` as `domain`, `site_name` as `name` 
    FROM `Sites` 
    WHERE IF(LENGTH('') > 0, LOCATE('', `site_name`) > 0, TRUE) && IF(LENGTH('.org') > 0, LOCATE('.org', `site_domain`) > 0, TRUE)
    This returns Test 2 - test-2.org

    Now lets say we want to search for a .com account that contains sql in its name.

    Code:
    SELECT `site_id` as `id`, `site_domain` as `domain`, `site_name` as `name` 
    FROM `Sites` 
    WHERE IF(LENGTH('sql') > 0, LOCATE('sql', `site_name`) > 0, TRUE) && IF(LENGTH('.com') > 0, LOCATE('.com', `site_domain`) > 0, TRUE)
    This returns MysQL - mysql.com

    Hope this is useful to everyone.
    Last edited by Coyote6; 09-13-2013 at 02:33 PM. Reason: Marking as Resolved

  • #2
    Regular Coder
    Join Date
    May 2009
    Location
    Moore, OK
    Posts
    282
    Thanks
    11
    Thanked 41 Times in 41 Posts
    As someone mentioned in another forum, I forgot about using full text search for multiple words. I was hoping I had found a solution without a mysql function, but it doesn't seem likely. This is still a hell of a lot faster than using
    Code:
    `column` LIKE '%search%'

  • #3
    Regular Coder
    Join Date
    May 2009
    Location
    Moore, OK
    Posts
    282
    Thanks
    11
    Thanked 41 Times in 41 Posts
    In order to actually search for multiple words like a full text search you have to create a function in mysql, which can limit your abilities to do this on shared hosting.

    The function:

    Code:
    DELIMITER $$
    DROP FUNCTION IF EXISTS `search`$$
    CREATE FUNCTION `search` (in_field TEXT, in_words TEXT)
    RETURNS BOOLEAN
    
    BEGIN
    
    DECLARE found BOOLEAN;
    DECLARE loc INT;
    DECLARE remaining_text TEXT;
    DECLARE current_text TEXT;
    
    SET found = FALSE;
    SET remaining_text = in_words;
    
    WHILE remaining_text != '' DO
    
      SET loc = LOCATE(' ', remaining_text);
    
      IF loc > 0 THEN
        SET current_text = SUBSTR(remaining_text, 1, loc - 1);
      ELSE
        SET current_text = remaining_text;
      END IF;
    
      SET loc = LOCATE(current_text, in_field);
    
      IF loc > 0 THEN
        SET found = TRUE;
        SET remaining_text = '';
      ELSE
        SET remaining_text = SUBSTR(remaining_text, CHAR_LENGTH(current_text) + 2);
      END IF;
    
    END WHILE;
    
    RETURN found;
    END$$
    DELIMITER;
    Then replace the locate function in the if statement with the search function. The field or text to be searched is the first parameter (haystack) and the second is the words to search for (needle).

    Code:
    SELECT `site_id` as `id`, `site_domain` as `domain`, `site_name` as `name` 
    FROM `Sites` 
    WHERE IF(LENGTH('sql'), search(`site_name`, 'graph sql'), TRUE) && IF(LENGTH('.com'), search(`site_domain`, 'multi words .com'), TRUE)
    Now we have our a multi word search that returns true if any of the words are matched. The results from this query should my the MySQL site and Coyote6 GraphX.

  • #4
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    ...which can limit your abilities to do this on shared hosting.
    Nonsense. You can do the same thing in PHP code, no need for the MySQL function. And you can even make it much more flexible:

    Let's take this query of your as an example:
    Code:
    SELECT `site_id` as `id`, `site_domain` as `domain`, `site_name` as `name` 
    FROM `Sites` 
    WHERE IF(LENGTH(:n) > 0, LOCATE(:n, `site_name`) > 0, TRUE) 
      AND IF(LENGTH(:d) > 0, LOCATE(:d, `site_domain`) > 0, TRUE)
    And build it up in PHP code:
    Code:
    function createSearchOn( $fldname, $values, $andOr )
    {
        $search = "";
        $delimiter = "";
    
        $words = explode( " ", $values );
        for ( $w = 0; $w < count($words); ++$w )
        {
            $word = mysql_real_escape_string($words[$w]);
            $search .= "$delimiter LOCATE( $fldname, '$word' ) ";
            $delimiter = $andOr;
        }
        return "( $search )";
    }
    
    
    $whereAnd = " WHERE ";
    $sql = "SELECT site_id, site_domain, site_name FROM sites ";
    
    // example of getting search terms:
    $allOrAny = $_POST["allOrAny"]; // allow user to choose search all, search any
    $conjunction = $allOrAny == "ANY" ? " OR " : " AND ";
    
    $sitename = $_POST["sitename"];
    if ( isSet($sitename) )
    {
        $sql .= $whereAnd . createSearchOn("site_name",$sitename,$conjunction);
        $whereAnd = $conjunction;
    }
    $sitedomain = $_POST["sitedomain"];
    if ( isSet($sitedomain) )
    {
        $sql .= $whereAnd . createSearchOn("site_domain",$sitedomain,$conjunction);
        $whereAnd = $conjunction;
    }
    I don't use PHP, so pardon any typos.

    But I think you can see how much flexibility there is this way.

    The two places I user $conjunction *could* be separately chosen by the user:
    -- Find matches only if all words found in all fields.
    -- Find matches if all words found in one or more fields
    -- Find matches if any word found in any field
    -- Find matches if any word found in all fields.

    Gives a lot more flexibility to the user than hard coding it all in the MySQL code.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #5
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    But you know, I just did a bit of testing, and I found that using LOCATE() was measurably *SLOWER* than using LIKE. Pretty much consistently, I get times 15% to 20% *FASTER* using LIKE.

    Which is not surprising to me. LIKE should be faster, as it is used so much more often. One would hope that the query designers worked hard to make it faster.

    So I'm wondering how much testing you did, comparing the two, under what circumstances.

    Given my results, I'd recommend changing my code to
    Code:
    function createSearchOn( $fldname, $values, $andOr )
    {
        $search = "";
        $delimiter = "";
    
        $words = explode( " ", $values );
        for ( $w = 0; $w < count($words); ++$w )
        {
            $word = mysql_real_escape_string($words[$w]);
            $search .= "$delimiter $fldname LIKE '%$word%' ";
            $delimiter = $andOr;
        }
        return "( $search )";
    }
    (Rest of my code stays the same.)
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    Oh, and by the by, note that my code handles apostrophes in the search words. Yours doesn't. A minor fix needed, but clearly needed.

    Also, my code eliminates the need for your IF(LENGTH('sql') > 0 (et al.) tests in the SQL code, which presumably would give a minor performance boost.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #7
    Regular Coder
    Join Date
    May 2009
    Location
    Moore, OK
    Posts
    282
    Thanks
    11
    Thanked 41 Times in 41 Posts
    Hi Old Pedant,

    Thanks for the comments. The goal I was trying to accomplish was to build it without having to build a dynamic query in php. I personally would prefer to keep everything within a stored procedure without having to worry about entering a concat SQL string into it. I did not spend the time to write and or statements to make it more flexible.

    As for the testing, I was testing on a smaller scale. It was running the entire search query on the smaller table at around .0006-.0009 seconds per query. The simple LOCATE function ran .0004 seconds per query, where as LIKE was running anywhere from .0044-.0650 seconds per query. After your comment, I went and tested it on a larger table (~4,000 rows, which is still not that big). As I mentioned in my previous statement I hadn't done a lot of testing. You are right. Once the table size was enlarged, LIKE stayed consistent on its results, LOCATE took about the same as LIKE .0046 seconds, where as my full search function took about a quarter of a second (because it is searching for all the words). But this trend would not be good on even larger tables. I may play around with it some more and switch the search function to use LIKE to see how that may change. Is there a better way to index for LIKE searches other than a standard table index?

    As for the php part, I would recommend using PDO because mysql_real_escape_string is depreciated. To do it that way you would want to make your loop return the string and parameters together so that you keep the structure right.

    PHP Code:
    function createSearchOn$fldname$values$andOr )
    {
        
    $search "";
        
    $delimiter "";
        
    $params = array();

        
    $words explode" "$values );
        for ( 
    $w 0$w count($words); ++$w )
        {
            
            
    $params[':' $fldname $w] =$words[w];
            
    $search .= "$delimiter $fldname LIKE '%:" $fldname $w "%' ";
            
    $delimiter $andOr;
        }
        return array (
    'search' => $search'params'=>$params);

    Then after each time you ran this function you would have to combine the parameters into one master array, and the search into one master string to then attach to the PDO statement.

    And yes using PHP for loops and calculations is a heck of a lot faster than MySQL, but like I said the goal was to try to make strictly MySQL solution. I may have to rethink that if this process is too slow on really large tables.

    Thanks again for the comments.

  • #8
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    26,536
    Thanks
    80
    Thanked 4,490 Times in 4,454 Posts
    Well, you probably have read that I don't use PHP. The most I have done is write two pages of PHP code where I did, at least, use mysqli. So PDO isn't something I've dipped my fingers into.

    I have, however, used parameterized queries in JSP and ASP and ASP.NET, so I totally agree with the idea of using PDO.

    Regarding better indexes for LIKE (and for LOCATE, for that matter): Unfortunately, MySQL is almost braindead when it comes to using an indexed field and a LIKE condition. The *ONLY* time MySQL will use and index when performing a LIKE is when you use WHERE field LIKE 'prefix%'. That is, when the only wild card character is at the *end* of the think to be LIKEd. A leading wildcard (LIKE '%xxx') or embedded wildcard (LIKE 'abc%xxx') causes MySQL to ignore the index completely.

    This is just one of many ways in which MySQL's usage of indexes simply doesn't come close to the performance of the "big boys" (SQL Server and Oracle, especially).
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • Users who have thanked Old Pedant for this post:

    Coyote6 (09-16-2013)

  • #9
    Regular Coder
    Join Date
    May 2009
    Location
    Moore, OK
    Posts
    282
    Thanks
    11
    Thanked 41 Times in 41 Posts
    Totally agree with the indexing Hopefully that is something Oracle will look into the future with MySQL. I know they get hit up a lot for the Full Text Search feature. I remember emailing them myself when they first made the default switch to InnoDB. If I recall correctly it was something they considered add in the future, but who knows when that may be or if they ran into some sort of road block. I was just hoping maybe I was missing something easy that would make it run better.


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •