Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    New Coder
    Join Date
    May 2003
    Location
    in a small damp cupboard
    Posts
    37
    Thanks
    0
    Thanked 0 Times in 0 Posts

    extracting information from an <a> tag using regexp

    Ok, here's the dealio

    I have a string which has a load of <a href="http://domain.tld" title="detailed description">link text</a> style things in it, one per line

    How would I extract:

    1) the URL
    2) the value of title attribute
    3) the link text

    I figured regular expressions are the way to go, but I'm a little confused on where to start!

    Any pointers? I came up with this:

    PHP Code:
    <?php
    function extractLink($link) {
        
    $link split("\n",trim($link));
        for(
    $i 0; isset($link[$i]); $i++){
            
    $link[$i] = explode("\"",$link[$i]);
            
    $link[$i]['url'] = substr($link[$i][1],7);
            
    $link[$i]['description'] = $link[$i][3];
            
    $link[$i]['title'] = substr($link[$i][4],1);
            
    $link[$i]['title'] = strrev(substr(strrev($link[$i]['title']),4));
        }
        for(
    $i 0; isset($link[$i]); $i++){
            foreach(
    $link[$i] as $key => $value){
                if(
    is_numeric($key)){
                    unset(
    $link[$i][$key]);
                } else {
                    
    $link[$i][$key] = htmlentities($value);
                }
            }
        }
        return 
    $link;
    }
    ?>
    Which, while crude, does the job but it'd get messed up if there is no title attribute.

    Thanks in advance,

    MrJ
    Last edited by mrjamin; 04-15-2004 at 11:12 PM. Reason: added [php] tag

  • #2
    Senior Coder
    Join Date
    Feb 2004
    Posts
    1,206
    Thanks
    0
    Thanked 0 Times in 0 Posts
    It would be better if we saw the actual string, that way we can help you from the start.

    Also, since the algorithm is completely dependent on the string input, it is imperative that we see that string.

    Sadiq.

  • #3
    Mega-ultimate member
    Join Date
    Jun 2002
    Location
    Winona, MN - The land of 10,000 lakes
    Posts
    1,855
    Thanks
    1
    Thanked 45 Times in 42 Posts
    Hmm, just guessing here, but try...

    Code:
    preg_match_all("/<a\s{1,2}href=\"(.*?)\"\s{1,2}title=\"(.*?)\">(.*?)<\/a>/",$string,$matches);
    print_r($matches);
    I think that should work, mordred could probably clean it up a bit.

    I think the href will be in the matches[1], title in the matches[2] array and text in the matches[3] array. But i didn't test it. That's why I've got the print_r which will recursively print the matches array.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •