Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 3 123 LastLast
Results 1 to 15 of 31
  1. #1
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts

    Newbie converting from PHP - Regular expression

    Hi all, im a PHP man but a certain script of mine is not performing well in PHP so im trying to learn Perl.

    Could someone point me into the right direction of why this is not working?

    Code:
    #!/usr/bin/perl
    my $url = 'http://www.actwebdesigns.co.uk/';
    
    use LWP::Simple;
    my $content = get $url;
    die "Couldn't get $url" unless defined $content; #die unless content is found
    
    if(@links = ($content =~ m#<a[^>]+)>#isg)){
    	foreach (@links) {
    	print $_ . "\n";
    	}
    }else{
    	print "Could not find links.";
    }
    any help much appreciated
    Website Design Mansfield
    PHP Code:
    function I_LOVE(){function b(&$b='P'){$b.='P';}function a($_){return $_++;}$b='P';define("B",'H');b($b=implode('',array($b=a($b),$b=a(B))));b($b);return $b;}
    echo 
    I_LOVE(); 

  • #2
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,877
    Thanks
    2
    Thanked 164 Times in 159 Posts
    You're missing the opening capturing paren in the regex and the closing paren is in the wrong place.

    All Perl scripts should include the strict and warnings pragmas. Those pragmas will point out lots of problems that can be difficult to track down.
    In this case the error is
    Unmatched ) in regex; marked by <-- HERE in m/<a[^>]+) <-- HERE / at ..
    Here's the corrected version.
    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use LWP::Simple;
    
    my $url = 'http://www.actwebdesigns.co.uk/';
    my $content = get $url;
    die "Couldn't get $url" unless defined $content; #die unless content is found
    
    if(my @links = ($content =~ m#(<a[^>]+>)#isg)){
    
    	print "$_\n" for @links;
    
    }else{
    	print "Could not find links.";
    }
    Last edited by FishMonger; 12-06-2009 at 05:07 PM.

  • #3
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,877
    Thanks
    2
    Thanked 164 Times in 159 Posts
    You may want to look at:

    HTML::LinkExtor - Extract links from an HTML document
    http://search.cpan.org/~gaas/HTML-Pa...L/LinkExtor.pm

  • #4
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts
    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    print "\n";
    print "Enter a website: http://www.";
    my $url = 'http://www.'.<>.'/';
    
    use LWP::Simple;
    my $content = get $url;
    die "Couldn't get $url" unless defined $content; #die unless content is found
    
    #@pagesArray = (url);
    #@foundPagesArray = ();
    #@mainPagesArray = ();
    
    if(@links = ($content =~ m#<(?:(?:a)|(?:area))[^>]*href=\"([^\"\#\?]+(?:(?:\.html)|(?:\.php)|(?:\.aspx)|(?:\.htm)|(?:\.asp)|(?:\.shtml)|(?:/)))\"#isg)){
    	foreach (@links) {
    		print $_ . "\n";
    	}
    }else{
    	print "Could not find links.";
    }
    lovely, yet again!

    fixed that and added those strict things, but now getting error messages.

    one being

    Global symbol "@links" requires explicit package name at C:\perlscripts\hello.pl line 17.

    The other thing being i changed

    Code:
    my $url = 'http://www.website.co.uk/';
    to
    Code:
    my $url = 'http://www.'.<>.'/';
    but keeps saying that it couldn't get the content when it could before... Any ideas?
    Website Design Mansfield
    PHP Code:
    function I_LOVE(){function b(&$b='P'){$b.='P';}function a($_){return $_++;}$b='P';define("B",'H');b($b=implode('',array($b=a($b),$b=a(B))));b($b);return $b;}
    echo 
    I_LOVE(); 

  • #5
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by Phil Jackson View Post
    Code:
    if(my @links = ($content =~ m#<(?:(?:a)|(?:area))[^>]*href=\"([^\"\#\?]+(?:(?:\.html)|(?:\.php)|(?:\.aspx)|(?:\.htm)|(?:\.asp)|(?:\.shtml)|(?:/)))\"#isg)){
    	foreach (@links) {
    		print $_ . "\n";
    	}
    }else{
    	print "Could not find links.";
    }
    lovely, yet again!

    fixed that and added those strict things, but now getting error messages.

    one being

    Global symbol "@links" requires explicit package name at C:\perlscripts\hello.pl line 17.
    in perl with strict pragma you must use my, our or local first time when use a variable. If you look to FishMonger you will see a my before @link

    Code:
    my $url = 'http://www.'.<>.'/';
    but keeps saying that it couldn't get the content when it could before... Any ideas?
    <> will get the line but also the line terminator, \n *nix, \r mac or \r\n for windows.
    you can use chomp to remove it.

    a url can be mailto:..., absolute 'http://....' or relative, simple 'index.php'. Will be hard to reimplement all this stuff in a regex and there are oprimisation problems, for example using foreach as you did is slower the using for like in FishMonger code.
    Therefor is better to use HTML::LinkExtor module, see FishMonger post for link.

    best regards

  • #6
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts
    Thanks, i will look into that.. is this how chomp would be used?

    Code:
    my $content = get "http://www" . chomp($url) . "/";
    Website Design Mansfield
    PHP Code:
    function I_LOVE(){function b(&$b='P'){$b.='P';}function a($_){return $_++;}$b='P';define("B",'H');b($b=implode('',array($b=a($b),$b=a(B))));b($b);return $b;}
    echo 
    I_LOVE(); 

  • #7
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by Phil Jackson View Post
    Thanks, i will look into that.. is this how chomp would be used?

    Code:
    my $content = get "http://www" . chomp($url) . "/";
    yes, is correct.

    a suggestion. You can use Data:: Dumper like you use var_dump in php to see the content of one or more variables. For example:

    Code:
    use Data::Dumper;
    print Dumper(@links, $content, $url);
    best regards

  • Users who have thanked oesxyl for this post:

    Phil Jackson (12-06-2009)

  • #8
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,877
    Thanks
    2
    Thanked 164 Times in 159 Posts
    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use LWP::Simple;
    
    print "\nEnter a website: http://www.";
    chomp(my $domain = <>);
    
    my $url = "http://www.$domain";
    my $content = get $url;
    
    die "Couldn't get $url" unless defined $content; #die unless content is found
    
    #@pagesArray = (url);
    #@foundPagesArray = ();
    #@mainPagesArray = ();
    
    # complex regex's like this are rarely needed
    # and in most cases they're the wrong approach
    # this one could be reduced, but the module I pointed to would be better.
    if(my @links = ($content =~ m#<(?:(?:a)|(?:area))[^>]*href=\"([^\"\#\?]+(?:(?:\.html)|(?:\.php)|(?:\.aspx)|(?:\.htm)|(?:\.asp)|(?:\.shtml)|(?:/)))\"#isg)){
    	foreach (@links) {
    		print $_ . "\n";
    	}
    }else{
    	print "Could not find links.";
    }

  • Users who have thanked FishMonger for this post:

    Phil Jackson (12-06-2009)

  • #9
    Super Moderator
    Join Date
    May 2005
    Location
    Southern tip of Silicon Valley
    Posts
    2,877
    Thanks
    2
    Thanked 164 Times in 159 Posts
    Code:
    use warnings;
    use YAPE::Regex::Explain;
    
    print YAPE::Regex::Explain->new('m#<(?:(?:a)|(?:area))[^>]*href=\"([^\"\#\?]+(?:(?:\.html)|(?:\.php)|(?:\.aspx)|(?:\.htm)|(?:\.asp)|(?:\.shtml)|(?:/)))\"#isg')->explain;
    Outputs:
    Code:
    The regular expression:
    
    (?-imsx:m#<(?:(?:a)|(?:area))[^>]*href=\"([^\"\#\?]+(?:(?:\.html)|(?:\.php)|(?:\.aspx)|(?:\.htm)|(?:\.asp)|(?:\.shtml)|(?:/)))\"#isg)
    
    matches as follows:
      
    NODE                     EXPLANATION
    ----------------------------------------------------------------------
    (?-imsx:                 group, but do not capture (case-sensitive)
                             (with ^ and $ matching normally) (with . not
                             matching \n) (matching whitespace and #
                             normally):
    ----------------------------------------------------------------------
      m#<                      'm#<'
    ----------------------------------------------------------------------
      (?:                      group, but do not capture:
    ----------------------------------------------------------------------
        (?:                      group, but do not capture:
    ----------------------------------------------------------------------
          a                        'a'
    ----------------------------------------------------------------------
        )                        end of grouping
    ----------------------------------------------------------------------
       |                        OR
    ----------------------------------------------------------------------
        (?:                      group, but do not capture:
    ----------------------------------------------------------------------
          area                     'area'
    ----------------------------------------------------------------------
        )                        end of grouping
    ----------------------------------------------------------------------
      )                        end of grouping
    ----------------------------------------------------------------------
      [^>]*                    any character except: '>' (0 or more times
                               (matching the most amount possible))
    ----------------------------------------------------------------------
      href=                    'href='
    ----------------------------------------------------------------------
      \"                       '"'
    ----------------------------------------------------------------------
      (                        group and capture to \1:
    ----------------------------------------------------------------------
        [^\"\#\?]+               any character except: '\"', '\#', '\?'
                                 (1 or more times (matching the most
                                 amount possible))
    ----------------------------------------------------------------------
        (?:                      group, but do not capture:
    ----------------------------------------------------------------------
          (?:                      group, but do not capture:
    ----------------------------------------------------------------------
            \.                       '.'
    ----------------------------------------------------------------------
            html                     'html'
    ----------------------------------------------------------------------
          )                        end of grouping
    ----------------------------------------------------------------------
         |                        OR
    ----------------------------------------------------------------------
          (?:                      group, but do not capture:
    ----------------------------------------------------------------------
            \.                       '.'
    ----------------------------------------------------------------------
            php                      'php'
    ----------------------------------------------------------------------
          )                        end of grouping
    ----------------------------------------------------------------------
         |                        OR
    ----------------------------------------------------------------------
          (?:                      group, but do not capture:
    ----------------------------------------------------------------------
            \.                       '.'
    ----------------------------------------------------------------------
            aspx                     'aspx'
    ----------------------------------------------------------------------
          )                        end of grouping
    ----------------------------------------------------------------------
         |                        OR
    ----------------------------------------------------------------------
          (?:                      group, but do not capture:
    ----------------------------------------------------------------------
            \.                       '.'
    ----------------------------------------------------------------------
            htm                      'htm'
    ----------------------------------------------------------------------
          )                        end of grouping
    ----------------------------------------------------------------------
         |                        OR
    ----------------------------------------------------------------------
          (?:                      group, but do not capture:
    ----------------------------------------------------------------------
            \.                       '.'
    ----------------------------------------------------------------------
            asp                      'asp'
    ----------------------------------------------------------------------
          )                        end of grouping
    ----------------------------------------------------------------------
         |                        OR
    ----------------------------------------------------------------------
          (?:                      group, but do not capture:
    ----------------------------------------------------------------------
            \.                       '.'
    ----------------------------------------------------------------------
            shtml                    'shtml'
    ----------------------------------------------------------------------
          )                        end of grouping
    ----------------------------------------------------------------------
         |                        OR
    ----------------------------------------------------------------------
          (?:                      group, but do not capture:
    ----------------------------------------------------------------------
            /                        '/'
    ----------------------------------------------------------------------
          )                        end of grouping
    ----------------------------------------------------------------------
        )                        end of grouping
    ----------------------------------------------------------------------
      )                        end of \1
    ----------------------------------------------------------------------
      \"                       '"'
    ----------------------------------------------------------------------
      #isg                     '#isg'
    ----------------------------------------------------------------------
    )                        end of grouping
    ----------------------------------------------------------------------

  • #10
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts
    Hmm, very helpful material guys. Shall keep on chuggin through!
    Website Design Mansfield
    PHP Code:
    function I_LOVE(){function b(&$b='P'){$b.='P';}function a($_){return $_++;}$b='P';define("B",'H');b($b=implode('',array($b=a($b),$b=a(B))));b($b);return $b;}
    echo 
    I_LOVE(); 

  • #11
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts
    Sorry again but does anybody know if there is a function to count the number of elements in an array?
    Website Design Mansfield
    PHP Code:
    function I_LOVE(){function b(&$b='P'){$b.='P';}function a($_){return $_++;}$b='P';define("B",'H');b($b=implode('',array($b=a($b),$b=a(B))));b($b);return $b;}
    echo 
    I_LOVE(); 

  • #12
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by Phil Jackson View Post
    Sorry again but does anybody know if there is a function to count the number of elements in an array?
    Code:
    print scalar @links;
    you will find often $#links in some scripts.

    best regards

  • Users who have thanked oesxyl for this post:

    Phil Jackson (12-06-2009)

  • #13
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts
    Was wondering if you could help. I wanted to be able to find the value and key from an array. So i did some googling and found this;

    Code:
    # initialize a hash structure (name, age)
    my %persons = (John => 25, Anne =>32, Paul =>22, Smith => 29);
    foreach my $name ( %persons ) {
      print $name . $persons{$name} . "\n";
    }
    I'm hoping someone could explain it slightly.
    1) how come the array uses % and not @
    2) im struggling to work out how this has come about being the key: $persons{$name}

    any help much appreciated...again
    Website Design Mansfield
    PHP Code:
    function I_LOVE(){function b(&$b='P'){$b.='P';}function a($_){return $_++;}$b='P';define("B",'H');b($b=implode('',array($b=a($b),$b=a(B))));b($b);return $b;}
    echo 
    I_LOVE(); 

  • #14
    Senior Coder
    Join Date
    Aug 2009
    Location
    Mansfield, Nottinghamshire, UK
    Posts
    1,555
    Thanks
    57
    Thanked 148 Times in 147 Posts
    Quote Originally Posted by Phil Jackson View Post
    Was wondering if you could help. I wanted to be able to find the value and key from an array. So i did some googling and found this;

    Code:
    # initialize a hash structure (name, age)
    my %persons = (John => 25, Anne =>32, Paul =>22, Smith => 29);
    foreach my $name ( %persons ) {
      print $name . $persons{$name} . "\n";
    }
    I'm hoping someone could explain it slightly.
    1) how come the array uses % and not @
    2) im struggling to work out how this has come about being the key: $persons{$name}

    any help much appreciated...again


    does $persons{$name} mean "get key from %persons where value = $name value" ?
    Website Design Mansfield
    PHP Code:
    function I_LOVE(){function b(&$b='P'){$b.='P';}function a($_){return $_++;}$b='P';define("B",'H');b($b=implode('',array($b=a($b),$b=a(B))));b($b);return $b;}
    echo 
    I_LOVE(); 

  • #15
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by Phil Jackson View Post
    does $persons{$name} mean "get key from %persons where value = $name value" ?
    Edit: get each element from list and assign it to $name


    keys and values will return the keys or values as a list from a hash array and foreach expect a list inside round brackets:
    Code:
    # initialize a hash structure (name, age)
    my %persons = (John => 25, Anne =>32, Paul =>22, Smith => 29);
    foreach my $name (keys %persons ) {
      print $name . $persons{$name} . "\n";
    }
    despite php you can use , instead of . in print like this:
    Code:
    print $name, $persons{$name}, "\n";
    best regards
    Last edited by oesxyl; 12-06-2009 at 07:49 PM.


  •  
    Page 1 of 3 123 LastLast

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •