Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Regular Coder
    Join Date
    Aug 2006
    Location
    UK, London, Dartford
    Posts
    221
    Thanks
    3
    Thanked 14 Times in 14 Posts

    Getting information from URL's

    Hey guys i managed to get curl to work and with POST, but now when it retrieves the content i only would like to strip information from the URL's.

    Example:
    <a href="profile.php?id=226345">WildestThing</a>

    There would normally be 20+ links exept different ID's and user names.

    I would like it to list like this:

    WildestThing - 226345
    WildestThing1 - 226346

    So on and on.

    I've thought of eregi, but i not really sure what regex or how to get it from the URL/HTML.

    Any help would be great!

    Thank You!!

  • #2
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    First off, don't use eregi(), or any functions from PHP's ereg library for that matter. They are slow and have basically been deprecated in favor of the preg library... and will be removed altogether in PHP6.

    You'll want to have a look at preg_match_all().

    Try running this function and see what you come up with:
    PHP Code:
    /**
     * parse_links()
     * Returns the number of matches on success, or boolean FALSE if no matches.
     * Assigns matches to second parameter's variable name
     */
    function parse_links$input, & $matches NULL )
    {
        
    $regexp '/\<a.+href\="profile\.php\?id\=(\d+)".*\>(.+)\<\/a\>/Usi';
        
    $count  preg_match_all$regexp$input$mPREG_SET_ORDER );
        if ( 
    $count )
        {
            
    $matches = array();
            for ( 
    $i 0$i $count$i++ )
            {
                
    $matches[] = array
                (
                    
    'id'   => $m[$i][1],
                    
    'name' => $m[$i][2],
                );
            }
            return 
    $count;
        }
        return 
    FALSE;

    Usage:
    PHP Code:
    $source file_get_contents$url ); // however you get the HTML source

    if ( parse_links$source$matches ) )
    {
        
    print_r$matches );
    }
    else
    {
        echo 
    'No matches';

    It should give you something like so:
    Code:
    Array
    (
        [0] => Array
            (
                [id] => 226345
                [name] => WildestThing
            )
    
        [1] => Array
            (
                [id] => 226346
                [name] => WildestThing1
            )
    
        [2] => Array
            (
                [id] => 226347
                [name] => WildestThing2
            )
    
        [3] => Array
            (
                [id] => 226348
                [name] => WildestThing3
            )
    
        [4] => Array
            (
                [id] => 226349
                [name] => WildestThing4
            )
    
    )
    Last edited by kbluhm; 02-20-2008 at 09:31 PM.

  • Users who have thanked kbluhm for this post:

    Lee Stevens (02-20-2008)

  • #3
    Regular Coder
    Join Date
    Aug 2006
    Location
    UK, London, Dartford
    Posts
    221
    Thanks
    3
    Thanked 14 Times in 14 Posts
    Thank you very much!

    But i manged to sort somthing out:
    PHP Code:
        if(preg_match_all('/<a href="profile\.php\?id=(\d+)">(.*?)<\/a>/i'$content$matchesPREG_SET_ORDER))
        {
            foreach (
    $matches as $line_num => $val)
            {
                
    $userinfo[$line_num]['userid'] = $val[1];
                
    $userinfo[$line_num]['username'] = $val[2];
            }
        } 
    I was useing curl to get the information.

  • #4
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    What was the something that you managed to sort out?

    Also, this bit of the regexp that you modified...
    Code:
    (.*?)
    ... is redundant. The asterisk says zero or more. The question mark says optional (zero or one). There is no reason for the question mark when using the asterisk.

  • #5
    Master Coder
    Join Date
    Dec 2007
    Posts
    6,682
    Thanks
    436
    Thanked 890 Times in 879 Posts
    Quote Originally Posted by kbluhm View Post
    Also, this bit of the regexp that you modified...
    Code:
    (.*?)
    ... is redundant. The asterisk says zero or more. The question mark says optional (zero or one). There is no reason for the question mark when using the asterisk.
    there is a reason, see PCRE_UNGREEDY or U modifier,

    Edit:
    In my opinion is better to use:
    Code:
    [^<]*
    instead of that if don't expect to have only text and no other html elements between a tags.


    best regards
    Last edited by oesxyl; 02-20-2008 at 10:57 PM.

  • #6
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    Oh, he also ripped out the ungreedy modifier, as well as some other changes. How nice.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •