Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Banned
    Join Date
    Jul 2009
    Posts
    52
    Thanks
    10
    Thanked 4 Times in 4 Posts

    Trickier regex - definitely stuck

    Okay, my first experiment self-lesson worked fine...

    Code:
    $_="do re mi"; 
    print  if /^([^ ]+) +(\1)/;
    
    $_="ra ra ra";
    print  if /^([^ ]+) +(\1)/;
    prints out ra ra ra, but not do re mi

    but what i want to do is if it is "b b c" print it and if it is "k k k" not print it...


    I tried this and it buggered up...

    Code:
    $_="b b c";
    print  if /^([^ ]+) +(\1) +([^\1])/;
    
    
    $_="k k k";
    print  if /^([^ ]+) +(\1) +([^\1])/;
    the output was...

    Code:
    perl woo3
    b b ck k k
    So how can I achieve what I want, i.e. make it so that it has to be

    sheep sheep cow

    and not

    sheep sheep sheep?

    Presumably there's got to be a way of using the \1 to solve it? Anyone?

    this code
    Code:
    $_="b b c";
    print  if /^([^ ]+) +(\1) +[^\1]/;  
    
    
    $_="k k k";
    print  if /^([^ ]+) +(\1) +[^\1]/;  
    had the same result, so I guess one can leave it at that, the other brackets were superfluous to my failed attempt. So can anyone correct this latest piece of code... how can I preclude kkk and allow bbc?

    I see why it's wrong, ^ apparently is for saying no to a character, not any more than that

    so is there a way to say no to the entire word contained in \1
    Last edited by RabidMango; 07-20-2009 at 06:22 PM.

  • #2
    Senior Coder
    Join Date
    Mar 2006
    Posts
    1,274
    Thanks
    2
    Thanked 39 Times in 38 Posts
    For a single character the negated character class could work, but not for a "word" because whatever is inside the character class is not matched in any order. None of the character classes (called short cut character classes) are matched in any order, including \w \W \s and etc.

    What you want is a look ahead assertion, more formally called a zero-width look ahead assertion. I will provide you a link instead of some code because it looks like you are good at solving your own questions:

    http://perldoc.perl.org/perlretut.ht...looking-behind

  • Users who have thanked KevinADC for this post:

    RabidMango (07-20-2009)

  • #3
    Banned
    Join Date
    Jul 2009
    Posts
    52
    Thanks
    10
    Thanked 4 Times in 4 Posts
    Thanks for that. I went there, read what you suggested, had a go with a ?= and found a way to do it...

    Code:
    $_="b b c";
    print  if /^([^ ]+) +(\1) (?=\1)/;
    
    
    $_="k k k";
    print  if /^([^ ]+) +(\1) (?=\1)/;
    produces just kkk (and i wanted bbc) so...

    Code:
    $_="b b c";
    print  unless /^([^ ]+) +(\1) (?=\1)/;
    
    
    $_="k k k";
    print  unless /^([^ ]+) +(\1) (?=\1)/;
    should do the trick...

    I'll just test it (if it fails I'll edit the whole post, so this isn't really live, I'm just going through the motions)...

    Code:
    perl woo6
    b b c
    nice, it worked.

  • #4
    Banned
    Join Date
    Jul 2009
    Posts
    52
    Thanks
    10
    Thanked 4 Times in 4 Posts
    Code:
    $_="b b c";
    print  unless /^([^ ]+) +(\1) \1/;
    
    
    $_="k k k";
    print  unless /^([^ ]+) +(\1) \1/;
    (produces b b c as the only output)

    also works fine, and is clearly preferable... looks like I should hurry up and get back to reading about what in davy crocket ?= actually is for, since whatever I tried to do with it was pointless, it works entirely the same with just the \1

    in fact i reckon i can clean it up some more...

    Code:
    $_="b b c";
    print  unless /^([^ ]+) +(\1) \1/;
    
    
    $_="k k k";
    print  unless /^([^ ]+) +(\1) \1/;
    wait a minute, my solution is wrong...

    it shouldn't match a b c, but it does, alas
    obviously it's the way i've used unless - i've turned the universe on its head and everything's fallen down

    here's the right solution

    Code:
    $_="a b c";
    if (/^([^ ]+) +(\1)/){
    print  unless /^([^ ]+) +(\1) \1/;
    }
    
    $_="b b c";
    if (/^([^ ]+) +(\1)/){
    print  unless /^([^ ]+) +(\1) \1/;
    }
    
    $_="k k k";
    if (/^([^ ]+) +(\1)/){
    print  unless /^([^ ]+) +(\1) \1/;
    }
    for now - only one line longer (the replication of the process 3 times is just for the experimentation, in practise that would not be there)

    Code:
    $_="a b c";
    if (/^([^ ]+) \1/){   
    print  unless /^([^ ]+) \1 \1/;   
    }
    
    $_="b b c";
    if (/^([^ ]+) +(\1)/){
    print  unless /^([^ ]+) \1 \1/;   
    }
    
    $_="k k k";
    if (/^([^ ]+) +(\1)/){
    print  unless /^([^ ]+) \1 \1/;   
    }
    there it is, cleaned up even more and now one more clean to minimize the space used...

    Code:
    $_="a b c";
    if (/^(\w+) \1/){
    print  unless /^(\w+) \1 \1/;
    }
    
    $_="b b c";
    if (/^(\w+) \1/){
    print  unless /^(\w+) \1 \1/;
    }
    
    $_="k k k";
    if (/^(\w+) \1/){ 
    print  unless /^(\w+) \1 \1/;
    }
    Last edited by RabidMango; 07-21-2009 at 02:22 AM.

  • #5
    Banned
    Join Date
    Jul 2009
    Posts
    52
    Thanks
    10
    Thanked 4 Times in 4 Posts
    I haven't understood how to use the lookahead and lookbehind wotsits yet, clearly. Will have to deal with that tomorrow. At least I found a way to do what I wanted, though. Shame it took two lines of perl instead of one. Maybe I'll figure out how to do it in one soon.

  • #6
    Banned
    Join Date
    Jul 2009
    Posts
    52
    Thanks
    10
    Thanked 4 Times in 4 Posts
    Someone showed me the one I wanted, I thought I'd tried it, but I'd obviously got it a bit wrong...

    Code:
    $_="a b c";
    print if /^([^ ]+) +(\1) +(?!\1)/;
    
    $_="b b c";
    print if /^([^ ]+) +(\1) +(?!\1)/;
    
    $_="k k k";
    print if /^([^ ]+) +(\1) +(?!\1)/;
    yep the output worked as I wanted it. nice.

    I've neatened it up to this, and it still works fine (and only outputs b b c)...
    Code:
    $_="a b c";
    print if /^([^ ]+) \1 (?!\1)/; 
    
    $_="b b c";
    print if /^([^ ]+) \1 (?!\1)/; 
    
    $_="k k k";
    print if /^([^ ]+) \1 (?!\1)/;
    (or indeed /^(\w+) \1 (?!\1)/ which I tested and it works fine)

    Thanks for all assistance

    NB the solution with if/unless is much better in terms of efficiency of processing, I am told - lookahead is "computationally expensive" (it can get expensive looking ahead at every char pos in a large string), so bear that in mind anyone.
    Last edited by RabidMango; 07-21-2009 at 04:13 PM.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •