Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 12 of 12

Thread: Extracting URL

  1. #1
    New Coder
    Join Date
    Jan 2007
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Extracting URL

    Need some help extracting the first url from a string. Whether it would start with http or https. Thanks

    Input
    this is some text http://yahoo.com/directory/file.php?id=1&blah=1 this is some texthttp://msn.com this is some text
    this is some text https://google.com.

    Output
    http://yahoo.com/directory/file.php?id=1&blah=1

  • #2
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    You need to explore the preg_match() function for this.

    Set up a regular expression to match the link pattern you need. Most likely you would set this up to read all data between "http" and the first blank space afterward. The exact code will depend on your project.
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #3
    New Coder
    Join Date
    Jan 2007
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Yeah, It really confuses me. I've tried by myself with no luck.

    Input
    this is some text http://yahoo.com/directory/file.php?id=1&blah=1 this is some texthttp://msn.com this is some text
    this is some text https://google.com.

    Output
    http://yahoo.com/directory/file.php?id=1&blah=1

    I would like to be able to control which url is extracted. If I have a variable set to 1, the output is http://yahoo.com/directory/file.php?id=1&blah=1. If the variable is set to 3, the output is https://google.com. Any ideas?

  • #4
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    Quote Originally Posted by afrojojo View Post
    Yeah, It really confuses me. I've tried by myself with no luck.

    Input
    this is some text http://yahoo.com/directory/file.php?id=1&blah=1 this is some texthttp://msn.com this is some text
    this is some text https://google.com.

    Output
    http://yahoo.com/directory/file.php?id=1&blah=1

    I would like to be able to control which url is extracted. If I have a variable set to 1, the output is http://yahoo.com/directory/file.php?id=1&blah=1. If the variable is set to 3, the output is https://google.com. Any ideas?
    If you use preg_match_all() you will get an array of results, which you could then choose/use in any order you want to.
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #5
    New Coder
    Join Date
    Jan 2007
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Code:
    $pattern = '/((?:https?|ftp):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$])/i';	
    if(preg_match_all($pattern, $string, $match)) {
    $string=$match[0][0];
    echo $string
    }
    So I figured it out and came up with this. I had another question though.

    Example:
    $string = "<a href="http://google.com">Google</a> this is some text http://yahoo.com this is some texthttp://msn.com";

    If the string was what I have above, how do I make the pattern stop at a " or '. Otherwise the first string produced would be http://google.com">Google</a>. It only stops at the first space. I would like it to stop at the first space, the first ", or the first '.

    So I would want the output of $string=$match[0][0] to be http://google.com. The output of $string=$match[0][1] to be http://yahoo.com, and so on.

  • #6
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,546
    Thanks
    45
    Thanked 259 Times in 256 Posts
    Well, not really the pattern I'd use to search for a url... Plus, unless I'm missing something, I don't see why it should catch an apostrophe or double quote anyway... I don't see either as valid characters in your expression?

    If someone types in just google.com or www.google.com do you wanna ignore it?

    A quick google search some time back helped me find this site

    http://www.regexguru.com/2008/11/det...block-of-text/

    The author has a pretty good regex you can use and explains it a bit.

  • #7
    New Coder
    Join Date
    Jan 2007
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Keleth View Post
    Well, not really the pattern I'd use to search for a url... Plus, unless I'm missing something, I don't see why it should catch an apostrophe or double quote anyway... I don't see either as valid characters in your expression?

    If someone types in just google.com or www.google.com do you wanna ignore it?

    A quick google search some time back helped me find this site

    http://www.regexguru.com/2008/11/det...block-of-text/

    The author has a pretty good regex you can use and explains it a bit.
    I would like to ignore url's without a protocol. The url's i'm dealing with will always have a protocol.

    The reason I want to stop after apostrophe or quotes is because of the <a> tag using them to wrap the url.

    I dont want this - http://yahoo.com">Yahoo</a>
    I want this - http://yahoo.com

  • #8
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,546
    Thanks
    45
    Thanked 259 Times in 256 Posts
    Again... are you getting quotes in your test cases? Because I see nothing in your pattern that should match a quote of any kind...

  • #9
    New Coder
    Join Date
    Jan 2007
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Smile

    Quote Originally Posted by Keleth View Post
    Again... are you getting quotes in your test cases? Because I see nothing in your pattern that should match a quote of any kind...
    That's why I need your help.

  • #10
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,546
    Thanks
    45
    Thanked 259 Times in 256 Posts
    Well... I ran the code you put above and it works as expected... there are no problems... it stops at the quotes just like you wanted to and as it should. The code you gave pulls the 3 URLs perfectly. That's why I'm confused what you need help with.

  • #11
    New Coder
    Join Date
    Jan 2007
    Posts
    14
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Really? It still gives me junk.

    http://yahoo.com">test</a>

  • #12
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,546
    Thanks
    45
    Thanked 259 Times in 256 Posts
    Sorry man, I copied and pasted your code verbatim and it works for me. You might wanna make sure you didn't change your local code since you posted it here. Again, I see NO reason why you should get any quotes in your results, and the fact that you are baffles me completely.

    Try coping and pasting your code from here into an online regex program like http://gskinner.com/RegExr/ and see what happens.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •