Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    Regular Coder
    Join Date
    Feb 2007
    Posts
    113
    Thanks
    6
    Thanked 1 Time in 1 Post

    What could possibly be wrong here? (regex)

    I have a simple preg_match script here that is supposed to get the URL:

    PHP Code:
    <?php

    $string 
    "<a href=\"http://example.com/page.html\" title=\"my page\">";
    $get preg_match('href.*?title'$string);
    echo 
    $get;

    ?>
    I've tested my regex and it's correct, but when I run this code I get this error:
    Warning: preg_match() [function.preg-match]: Delimiter must not be alphanumeric or backslash in /homepages/21/h545454582/htdocs/pages/curl.php on line 4
    ...obviously there are no alphanumeric or backslash in my delimiter, so can anybody please tell me what is going on here?
    Last edited by eapro; 08-14-2008 at 08:26 AM.

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    You need to 'bound' you're pattern:
    PHP Code:

    $get 
    preg_match('href.*?title'$string);
    // Change to:
    $get preg_match('/href.*?title/'$string); 
    You will likely want to change the actual search to be more targeted. preg_match returns an integer result indicating its results found, not the actual results. So, you'll want something more like:
    PHP Code:
    $matches = array();
    preg_match('/<a href="(.*)" title="(.*)">.*<\/a>/msi'$string$matches); 
    The () are subpattern matches, so results will be stored in $matches[1] and $matches[2] for url and title. If you need multiple results, you need to use preg_match_all to continue the search and return a multidimensional array. In that case though, you'll likely need to force this to be 'ungreedy' with a U modifier.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • #3
    Regular Coder
    Join Date
    Feb 2007
    Posts
    113
    Thanks
    6
    Thanked 1 Time in 1 Post
    Thanks Fou-lu, it worked after downgrading from php 5 to 4. And I will take your suggestion as well, but I'm curious, what is the 'msi' at the end of the pattern?

  • #4
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    Those are pattern modifiers. I'm not sure what you're usage is so I added them in. Here are their meanings:
    m - Multiline. This allows pattern searches split over multiple lines.
    s - DotAll. Allows the wildcard '.' character to also match newline characters.
    i - Caseless. Allows pattern match regardless of string case.
    i is probably the most common, followed by m. U is for ungreedy, which if you have say a text document you're finding all matches for or scraping a webpage, you'll need to find all results. The ungreedy modifier tells it to only grab up to the first bounds instead of continuing. Otherwise, you'd likely have one result out of a document with 10 <a href="" title="">.</a>. This is because it matches from the first <a href=""... all the way to the very last </a>. It is 'greedy' by default, so it takes as much data in as it possibly can.
    To be honest, a much better explanation about patterns can be given to you from a Perl Pro. When it comes to patterns, I'd recommend asking them about patterns before a PHP developer. Just mention its in PHP

    Did it not work in PHP 5? I'll have to check it out tomorrow, its almost 3 am here so I should probably get some sleep :$
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • #5
    Regular Coder
    Join Date
    Feb 2007
    Posts
    113
    Thanks
    6
    Thanked 1 Time in 1 Post
    Great info there... just one more thing...

    For the string:
    Code:
    blah blah other stuffhere href="http://example.com/" blah blah other stuff here
    This regex...
    Code:
    /href(.*?)"/
    returns...
    http://example.com/"

    How would I get just the URL without the " at the end?
    Last edited by eapro; 08-14-2008 at 09:48 AM.

  • #6
    Regular Coder
    Join Date
    Feb 2007
    Posts
    113
    Thanks
    6
    Thanked 1 Time in 1 Post
    actually that did not work, can you show me how to get just the URL, inoring everything else around the URL?

    I'm not sure but I'm thinking it would have to use the items at the beginning and end of the url...
    Start:href=" End:" blah

  • #7
    Regular Coder the-dream's Avatar
    Join Date
    Mar 2007
    Location
    Northamptonshire, UK
    Posts
    477
    Thanks
    8
    Thanked 4 Times in 4 Posts
    Replace $result with the var that is holding the outputted http://example.com/"
    $result = str_replace('"', '', $result);

  • #8
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    The reason it did not work is because its not a 'complete' <a> tag. It will work if you use the format <a href="yourelink.html" title="">a</a>. If you have something like: 'blah blah href="yourlink.html" blah blah blah', you can match it with:
    PHP Code:
    $string 'blah blah href="yourlink.html" blah blah blah';
    $matches = array();
    preg_match('/href="(.*)"/msi'$string$matches); 
    Which is similar to you're original pattern.
    $matches[0] will contain the full match, and $matches[1] will contain yourelink.html. You need to be careful about arbitrary searches on href though, you may need to add an assertion to ensure it not matching actual <a> tags. This will capture regardless of if its in an <a> tag, and if you need to change it to ignore the <a> matches, you'll want to assert a (?!<a) in front of the pattern.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •