Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 11 of 11
  1. #1
    Regular Coder
    Join Date
    Nov 2007
    Location
    127.0.0.1
    Posts
    348
    Thanks
    26
    Thanked 40 Times in 39 Posts

    Regular Expression - retrieving website url

    Hi all,
    One more regex help.

    I'd like to retrieve the original url of sites from yahoo search results.

    For. e.g:
    1. www.example.com

    2. subdomain.example1.com (subdomain)

    If I go for this expression:
    Code:
    http://*[^/]*
    I'll get all the http://uk.wrs.yahoo.com/ from both links.

    But how to retrieve the highlighted sites.


    Code:
    http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/
    SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html
    
    http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/
    SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html
    Thank you
    Last edited by tagnu; 08-26-2008 at 02:22 PM.
    Blog Charity:Water
    WhatisWrongWith.me/tagnu - Send me anonymous feedback.

  • #2
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    18,250
    Thanks
    203
    Thanked 2,557 Times in 2,535 Posts
    This should move you forward:-


    Code:
    <script type = "text/javascript">
    
    var a = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html"
    
    var b = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html"
    
    var x = a.match(/(http%3A.+)/);
    x[0] = x[0].replace (/\%3A/,":")
    alert (x[0]);
    
    var y = b.match(/(http%3A.+)/);
    y[0] = y[0].replace (/\%3A/,":")
    alert (y[0]);
    
    </script>

    With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.

  • The Following 2 Users Say Thank You to Philip M For This Useful Post:

    abduraooft (08-26-2008), tagnu (08-29-2008)

  • #3
    Supreme Master coder! abduraooft's Avatar
    Join Date
    Mar 2007
    Location
    N/A
    Posts
    14,864
    Thanks
    160
    Thanked 2,224 Times in 2,211 Posts
    With sufficient thrust, pigs fly just fine. However, this is not necessarily a good idea. It is hard to be sure where they are going to land, and it could be dangerous sitting under them as they fly overhead.
    Lol, I thought the above code is for something else.
    The Dream is not what you see in sleep; Dream is the thing which doesn't let you sleep. --(Dr. APJ. Abdul Kalam)

  • #4
    Banned
    Join Date
    May 2005
    Location
    Midwest, U.S.
    Posts
    118
    Thanks
    1
    Thanked 26 Times in 23 Posts
    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
       "http://www.w3.org/TR/html4/loose.dtd">
    <html>
    <head>
    <title>Any Title</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <script type="text/javascript">
    
    	var nStr1 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html";
    	var nStr2 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html";
    
    	function getDomain(urlStr){
    
    		var nDomain = urlStr.substring(urlStr.lastIndexOf('//')+2,urlStr.lastIndexOf('/'));
    		return nDomain;
    	}
    
    	function init(){
    
    		alert(getDomain(nStr1));
    		alert(getDomain(nStr2));
    	}
    
    	onload = init;
    	
    </script>
    </head>
    	<body>
    		
    	</body>
    </html>

  • Users who have thanked Cranford for this post:

    tagnu (08-29-2008)

  • #5
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    18,250
    Thanks
    203
    Thanked 2,557 Times in 2,535 Posts
    Another example showing that there are more ways than one of killing a cat.

  • #6
    Regular Coder
    Join Date
    Nov 2007
    Location
    127.0.0.1
    Posts
    348
    Thanks
    26
    Thanked 40 Times in 39 Posts
    Thank you Philip, the code works fine.
    But in my case, I get 'http://example.com' in certain cases apart from 'http%3A//example.com'.

    I was really looking for an expression that would accommodate both cases.
    Blog Charity:Water
    WhatisWrongWith.me/tagnu - Send me anonymous feedback.

  • #7
    Regular Coder
    Join Date
    Nov 2007
    Location
    127.0.0.1
    Posts
    348
    Thanks
    26
    Thanked 40 Times in 39 Posts
    Quote Originally Posted by Cranford View Post
    Code:
    	var nStr1 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11kp4e70q/EXP=1219835696/**http&#37;3A//www.example.com/index.html";
    	var nStr2 = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html";
    
    	function getDomain(urlStr){
    
    		var nDomain = urlStr.substring(urlStr.lastIndexOf('//')+2,urlStr.lastIndexOf('/'));
    		return nDomain;
    	}
    
    	function init(){
    
    		alert(getDomain(nStr1));
    		alert(getDomain(nStr2));
    	}
    
    	onload = init;
    	
    </script>
    Cranford, thank you for the effort, your snippet suits my need and I'm currently moving with it. But I'm really curious if there's a regex.
    Adding 'http://' to nDomain; will make it easier to apply this as an attribute for any html element.

    Failing to add 'http://', will add the current domain as prefix to the returned variable nDomain. In this case, you'll get the output as

    'http://uk.wrs.yahoo.com/example.com'

    PHP Code:
    function getDomain(urlStr){

            var 
    nDomain urlStr.substring(urlStr.lastIndexOf('//')+2,urlStr.lastIndexOf('/'));
            return 
    'http://' nDomain;
        } 

    ps: Learning regex using expresso, I'll update this post as soon I find a good regex.
    Last edited by tagnu; 08-29-2008 at 02:27 PM.
    Blog Charity:Water
    WhatisWrongWith.me/tagnu - Send me anonymous feedback.

  • #8
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    18,250
    Thanks
    203
    Thanked 2,557 Times in 2,535 Posts
    Quote Originally Posted by tagnu View Post
    Thank you Philip, the code works fine.
    But in my case, I get 'http://example.com' in certain cases apart from 'http&#37;3A//example.com'.

    I was really looking for an expression that would accommodate both cases.
    Code:
    <script type = "text/javascript">
    
    var a = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11kp4e70q/EXP=1219835696/**http%3A//www.example.com/index.html"
    
    var b = "http://uk.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A5YS7HAx.;_ylu=X3oDMTEwaXAxMWJuBHNlYwNzcgRwb3MDNQRjb2xvA2luMl9pbnRsBHZ0aWQD/" + "SIG=11r0pjgq9/EXP=1219835696/**http%3A//subdomain.example1.com/index.html"
    
    a = a.replace(/\%3A/,":");  
    var x = a.match(/[^\b](http.+)/);
    x[0] = x[0].replace (/./,"");
    alert (x[0]);
    
    b = b.replace(/\%3A/,":");
    var y = b.match(/[^\b](http.+)/);
    y[0] = y[0].replace (/./,"");
    alert (y[0]);
    
    </script>

    Taking the liberty of modifying Cranford's solution:-
    Code:
    	function getDomain(urlStr){
    
    		var nDomain = urlStr.substring(urlStr.lastIndexOf('**')+2,urlStr.lastIndexOf('/'));
    		return nDomain;
    	}

    You can test your regular expressions at: http://www.claughton.clara.net/regextester.html
    Last edited by Philip M; 08-30-2008 at 09:30 AM. Reason: Modify Cranford's solution

  • #9
    Regular Coder
    Join Date
    Nov 2007
    Location
    127.0.0.1
    Posts
    348
    Thanks
    26
    Thanked 40 Times in 39 Posts
    Thank you!

    Got the regex http(.){1,3}\/\/[^\/]*/g

    Description:
    http followed by
    (.){1,4} any characters, min 1 or max 4 (to retrieve : and &#37;3A and also to include https),
    \/\/ and // (escaped so \/\/)
    [^\/]* any character except / (escaped so \/)
    /g return all occurrences of the match

    PHP Code:
    var urlStr "http://in.wrs.yahoo.com/_ylt=A8pWBj2w5bNIa84A54S7HAx.;_ylu=X3oDMTEwZTl2dThqBHNlYwNzcgRwb3MDNwRjb2xvA2luMl9pbnRsBHZ0aWQD/SIG=11kp4e70q/EXP=1219835696/**http%3A//www.thesdf.org/index.html"

    var res urlStr.match(/http(.){1,3}//[^/]*/g);
    document.write("count:"res.length "<br />");

    for(
    i=0;i<res.length;i++)
    document.write(res[i]+ "<br/>"); 
    ps: don't forget the /g
    With g flag returns an array containing the matches, without g flag returns just the first match or if no match is found returns null.
    I'm learning!
    Helpful resources: http://www.javascriptkit.com/javatutors/redev3.shtml
    Last edited by tagnu; 08-30-2008 at 08:24 PM. Reason: added description for the regex + changed regex to include https
    Blog Charity:Water
    WhatisWrongWith.me/tagnu - Send me anonymous feedback.

  • #10
    Supreme Master coder! Philip M's Avatar
    Join Date
    Jun 2002
    Location
    London, England
    Posts
    18,250
    Thanks
    203
    Thanked 2,557 Times in 2,535 Posts
    Quote Originally Posted by tagnu View Post
    Thank you!

    Got the regex http(.){1,3}\/\/[^\/]*/g

    Description:
    http followed by
    (.){1,3} any characters, min 1 or max 3 :, &#37;3A,
    \/\/ and // (escaped so \/\/)
    [^\/]* any character except / (escaped so \/)
    /g return all occurances
    To be picky, that does not work for https://
    So make it http(.){1,4}\/\/[^\/]*/g

  • #11
    Regular Coder
    Join Date
    Nov 2007
    Location
    127.0.0.1
    Posts
    348
    Thanks
    26
    Thanked 40 Times in 39 Posts
    Quote Originally Posted by Philip M View Post
    To be picky, that does not work for https://
    So make it http(.){1,4}\/\/[^\/]*/g
    That's true! thanks for pointing out.

    So a better regex is
    http(.){1,4}\/\/[^\/]*/g

    Updated the previous post too.
    Last edited by tagnu; 08-30-2008 at 08:25 PM.
    Blog Charity:Water
    WhatisWrongWith.me/tagnu - Send me anonymous feedback.


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •