Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 3 123 LastLast
Results 1 to 15 of 32

Thread: Text Selector

  1. #1
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post

    Text Selector

    1) Project Details: (be as specific as possible): This seems like a project that isn't my knowledge.

    I would like to create a program or a code I can put it my page or something along the lines of that, that will visit a URL that I input, select certain text from a page, and inserts the data into an sql table called "offers"

    I want it to visit a site and select the text after "Campaign Name" and "Requirements", "Country", "rate", "category" and "URL"

    All of the above things are in a new <td> after the <td> that says Campaign name and stuff. EXCEPT URL which is in a DIV TABLE.

    Here is a screenshot of one of the pages I would like it to extract info from: http://snpr.cm/BOUlNk.png

    COPY OF ONE OF THE PAGES SOURCE CODE: http://pastebin.com/raw.php?i=9QgCHMk8

    If you need any more info please don't hesitate to ask. This is not in my field of knowledge which is why I've come here to ask you guys.

    2) Payment method/ details (Paypal, check? Timeline?): Free? I have no money.

  • #2
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    Here is a really sloppy and convoluted script:

    PHP Code:
    <?php
    $string
    =file_get_contents('http://www.example.com/path/to/page.html');
    $start='<table';
    $end='</table>';

    $string=substr($string,strpos($string,$start),strrpos($string,$end)-strpos($string,$start));
    $string='<table><tr>'.substr($string,strpos($string,'<td width="30%" align="right"><b>ID</b></td>'));
    $string=substr($string,0,strpos($string,$end))."</table>";
    $string=str_replace('  ',' ',str_replace('  ',' ',str_replace("\n\n","\n",$string)));


    print 
    $string;


    libxml_use_internal_errors(TRUE);
    $dom = new DOMDocument();
    $dom->loadHTML($string);
    $xml simplexml_import_dom($dom);
    libxml_use_internal_errors(FALSE);

    $result $xml->xpath("//td");
    //print_r(each($result[5]));
    $temp=each($result[5]);
    print 
    "<p>Campaign Name: ".$temp[1][0]."</p>\n";
    $temp=each($result[9]);
    print 
    "<p>Description: ".$temp[1]."</p>\n";
    $temp=each($result[11]);
    print 
    "<p>Requirements: ".$temp[1]."</p>\n";
    $temp=each($result[13]);
    print 
    "<p>Category: ".$temp[1]."</p>\n";
    $temp=each($result[15]);
    print 
    "<p>Country: ".$temp[1]."</p>\n";
    $temp=each($result[17]);
    print 
    "<p>Rate: ".$temp[1]."</p>\n";
    ?>
    It might not work on anything other than your specified source code (other offers may be structured differently, I wouldn't know) but it's a start... And you get what you pay for!
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • Users who have thanked Rowsdower! for this post:

    markman641 (11-21-2011)

  • #3
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post
    I just thought of something... I need to be logged into the site to access a page and idk if the script would be able to go to the URL... DX

    So im not too sure if that would quite work.. but i will see

    Edit: doesnt work, this is the error i got: http://snpr.cm/gyF8yE.png
    Last edited by markman641; 11-19-2011 at 01:27 AM.

  • #4
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post
    anyone?

  • #5
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    You may not have the ability to use file_get_contents() on a remote address with your host. What happens if you just try this:

    PHP Code:
    <?php
    $string
    =file_get_contents('http://www.google.com/');
    $start='<table';
    $end='</table>';

    $string=substr($string,strpos($string,$start),strrpos($string,$end)-strpos($string,$start));
    $string='<table><tr>'.substr($string,strpos($string,'<td width="30%" align="right"><b>ID</b></td>'));
    $string=substr($string,0,strpos($string,$end))."</table>";
    $string=str_replace('  ',' ',str_replace('  ',' ',str_replace("\n\n","\n",$string)));


    print 
    $string;
    ?>
    Does anything show up in the page or is it blank?
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #6
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post
    It shows up blank

  • #7
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post
    anyone?

  • #8
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    One more try... If this most basic example returns a blank result in your browser then your host simply doesn't allow remote use of file_get_contents() in which case you can't do what you are wanting to do...

    PHP Code:
    <?php
    $string
    =file_get_contents('http://www.google.com/');
    print 
    $string;
    ?>
    Try that and if you get a blank result then you know you're hosed. If not, then the script I provided earlier needs some work or else you need to find another script to do the job.

    But whatever you do you will need to be able to use either the cURL library or else be able to use file_get_contents() on a remote address. There is no other way to get another website's content to your server on-the-fly.
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #9
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post
    That worked! Which means your other script didnt work.

  • #10
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    Quote Originally Posted by markman641 View Post
    That worked! Which means your other script didnt work.
    OK then. At least you're past the first hurdle.

    I know that when I plugged in your sampled source code from your target page my script ran just fine. So if your source code was representative of the actual source code you encounter then my script should work. (If not, then you need to try adjusting the substring and string replacement functions until you narrow things down to the result you want.)

    Not to ask an insulting question, but you did update this line to use the actual URL you want to scrape, didn't you?

    Code:
    <?php
    $string=file_get_contents('http://www.example.com/path/to/page.html');
    $start='<table'; 
    $end='</table>'; 
    
    ...
    And my script assumes that the page you want to scrape is not behind a login or anything requiring a cookie. Because if you have to log in to see the screen that you want to scrape (or if you have to have a certain value set in a cookie) then this method won't be able to actually see the data you are trying to collect. You would need to use the cURL library instead. Do you need a cookie or a login to see the page you are trying to scrape?
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #11
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post
    yes i changed it. and yes it does need a login but i figured if i logged in then used the script it would work.. maybe not.

  • #12
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    Quote Originally Posted by markman641 View Post
    yes i changed it. and yes it does need a login but i figured if i logged in then used the script it would work.. maybe not.
    Yeah, definitely not. Your host's server (not your logged-in browser on your own computer) is visiting the page and your server does not share your session/cookies. It's like any other random user trying to visit the page from another computer while you're logged in. It's going to hit a login wall.

    You need to look into PHP's cURL library (which has the ability to make the server visit the page, simulating a real user, and log in/navigate pages). Then you capture contents from the logged-in state and log back out when you're finished with the capture.

    In order to do the cURL method one would have to have access to a valid account on the target site so they could see the way the log-in works and what things need to be "clicked" on and submitted in order to get around. Bottom line: I don't think you have much hope of getting a cURL script with login done for you for free.

    My advice would be to save up for a few weeks and post a paid work offer for someone to do this for you or else spend that same amount of time (or less) learning to use cURL on your own. If you can at least get the script logged in and grab the page data you want then you can script-bash that with what I have provided already to get a working model. The cURL part may or may not be messy (depending on how your target site's login system is set up). The cURL library in and of itself is not difficult to use, but navigating a website with cURL can get very tricky (and can break when the target site updates their code if they make changes to the login system's URL or variable names). If the login uses javascript then that can be another, possibly signficant layer of trouble to work out.

    Anyway, this link may be of some help to get you started with cURL log-ins:
    http://stackoverflow.com/questions/1...-in-to-website
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #13
    Regular Coder
    Join Date
    Jul 2011
    Posts
    272
    Thanks
    63
    Thanked 1 Time in 1 Post
    This is what I found by scanning the internet, But it's not working. I get the error: Also, am I supposed to have a Cookie.txt file? Here is the code:

    PHP Code:
    <?php
    // INIT CURL
    $ch curl_init();

    // SET URL FOR THE POST FORM LOGIN
    curl_setopt($chCURLOPT_URL'http://proleadsmedia.com/publishers/login.php');

    // ENABLE HTTP POST
    curl_setopt ($chCURLOPT_POST1);

    // SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
    curl_setopt ($chCURLOPT_POSTFIELDS'Username=********&Password=******');

    // IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
    curl_setopt ($chCURLOPT_COOKIEJAR'cookie.txt');

    # Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
    # not to print out the results of its query.
    # Instead, it will return the results as a string return value
    # from curl_exec() instead of the usual true/false.
    curl_setopt ($chCURLOPT_RETURNTRANSFER1);

    // EXECUTE 1st REQUEST (FORM LOGIN)
    $store curl_exec ($ch);

    // SET FILE TO DOWNLOAD
    curl_setopt($chCURLOPT_URL'http://proleadsmedia.com/publishers/campaigns/view.php?wid=592&cid=4811');

    // EXECUTE 2nd REQUEST (FILE DOWNLOAD)
    $content curl_exec ($ch);





    $url "http://proleadsmedia.com/publishers/campaigns/view.php?wid=592&cid=4811";

    //unique text to determine start goes here

    $start "start.txt";

    //insert end text here

    $end "end.txt";

    $ch curl_init();

    curl_setopt ($chCURLOPT_URL$url );

    curl_setopt ($chCURLOPT_HEADER0);

    curl_setopt ($chCURLOPT_RETURNTRANSFER1);

    $result curl_exec ($ch) or die ("Couldn't connect to $url.");

    curl_close ($ch);

    $startposition strpos($result,$start);

    if(
    $startposition 0){

    $endposition strpos($result,$end$startposition);

    //add enough chars to include the tag

    $endposition += strlen($end);

    $length $endposition-$startposition;

    $result substr($result,$startposition,$length);

    echo 
    $result;

    }else

    echo 
    "<center><h3>Not found - try again later.</h3></center>";








    // CLOSE CURL
    curl_close ($ch); 

    ?>


    BUT THEN I also just tried:

    Code:
    <?
    
    $loginUrl = 'http://proleadsmedia.com/publishers/login.php'; //action from the login form
    $loginFields = array('username'=>'m********', 'password'=>'********'); //login form field names and values
    $remotePageUrl = 'http://proleadsmedia.com/publishers/campaigns/view.php?wid=&cid=12462'; //url of the page you want to save  
    
    $login = getUrl($loginUrl, 'post', $loginFields); //login to the site
    
    $remotePage = getUrl($remotePageUrl); //get the remote page
    
    function getUrl($url, $method='', $vars='') {
        $ch = curl_init();
        if ($method == 'post') {
            curl_setopt($ch, CURLOPT_POST, 1);
            curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
        }
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies/cookies.txt');
        curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies/cookies.txt');
        $buffer = curl_exec($ch);
        curl_close($ch);
        return $buffer;
    }
    
    ?>
    and it came up as a blank page
    Last edited by Fumigator; 12-05-2011 at 11:24 PM. Reason: removed password

  • #14
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    Two things:

    1) Change your proleads password immediately. You forgot to delete it in one instance in your posted code (I know, I accidentally accessed it once and had to log out when I was testing).

    2) You missed one piece of the proper URL that the login form goes to. Try updating your initial setup with this:

    Code:
    <?php
    // INIT CURL
    $ch = curl_init();
    
    // SET URL FOR THE POST FORM LOGIN
    curl_setopt($ch, CURLOPT_URL, 'http://proleadsmedia.com/publishers/login.php?next');
    
    // ENABLE HTTP POST
    curl_setopt ($ch, CURLOPT_POST, true);
    
    // SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
    curl_setopt ($ch, CURLOPT_POSTFIELDS, 'username=markman641&password=************');
    
    // IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
    curl_setopt ($ch, CURLOPT_COOKIEJAR, './cookie.txt');
    
    # Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
    # not to print out the results of its query.
    # Instead, it will return the results as a string return value
    # from curl_exec() instead of the usual true/false.
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
    
    // EXECUTE 1st REQUEST (FORM LOGIN)
    $store = curl_exec ($ch);
    
    // SET FILE TO DOWNLOAD
    curl_setopt($ch, CURLOPT_URL, 'http://proleadsmedia.com/publishers/campaigns/view.php?wid=592&cid=4811');
    
    // EXECUTE 2nd REQUEST (FILE DOWNLOAD)
    $content = curl_exec ($ch);
    
    
    
    
    
    $url = "http://proleadsmedia.com/publishers/campaigns/view.php?wid=592&cid=4811";
    
    //unique text to determine start goes here
    
    $start = "start.txt";
    
    //insert end text here
    
    $end = "end.txt";
    
    $ch = curl_init();
    
    curl_setopt ($ch, CURLOPT_URL, $url );
    
    curl_setopt ($ch, CURLOPT_HEADER, 0);
    
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
    
    $result = curl_exec ($ch) or die ("Couldn't connect to $url.");
    
    curl_close ($ch);
    
    $startposition = strpos($result,$start);
    
    if($startposition > 0){
    
    $endposition = strpos($result,$end, $startposition);
    
    //add enough chars to include the tag
    
    $endposition += strlen($end);
    
    $length = $endposition-$startposition;
    
    $result = substr($result,$startposition,$length);
    
    echo $result;
    
    }else
    
    echo "<center><h3>Not found - try again later.</h3></center>";
    
    
    
    
    
    
    
    
    // CLOSE CURL
    curl_close ($ch);
    
    ?>
    That worked for me in a quick test (or at least, it logged me in and got me to the first detail page in the script and printed the contents after which I used exit(0) to prevent any further processing).

    After that, you have at least got a working login and the ability to navigate. You should be able to patch things up from that point.
    Last edited by Rowsdower!; 12-05-2011 at 09:00 PM.
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting

  • #15
    Senior Coder Rowsdower!'s Avatar
    Join Date
    Oct 2008
    Location
    Some say it's everything.
    Posts
    2,027
    Thanks
    5
    Thanked 397 Times in 390 Posts
    Short version, printing the page in question rather than processing it:

    PHP Code:
    <?php
    // INIT CURL
    $ch curl_init();

    curl_setopt($chCURLOPT_URL'http://proleadsmedia.com/publishers/login.php?next');
    curl_setopt ($chCURLOPT_POSTtrue);
    curl_setopt ($chCURLOPT_POSTFIELDS'username=markman641&password=************');
    curl_setopt ($chCURLOPT_COOKIEJAR'./cookie.txt');
    curl_setopt ($chCURLOPT_FOLLOWLOCATIONtrue);
    curl_setopt ($chCURLOPT_RETURNTRANSFERtrue);

    // EXECUTE 1st REQUEST (FORM LOGIN)
    $store curl_exec ($ch);

    // SET FILE TO DOWNLOAD
    curl_setopt($chCURLOPT_URL'http://proleadsmedia.com/publishers/campaigns/view.php?wid=592&cid=4811');

    // EXECUTE 2nd REQUEST (FILE DOWNLOAD)
    $content curl_exec ($ch);

    // LOG BACK OUT
    curl_setopt($chCURLOPT_URL'http://proleadsmedia.com/publishers/logout.php');
    $logged_out curl_exec ($ch);

    // CLOSE CURL
    curl_close ($ch);

    print 
    $content//instead of printing in real application you would search the contents string for the data you need...
    ?>
    Last edited by Rowsdower!; 12-05-2011 at 08:58 PM.
    The object of opening the mind, as of opening the mouth, is to shut it again on something solid. –G.K. Chesterton
    See Mediocrity in its Infancy
    It's usually a good idea to start out with this at the VERY TOP of your CSS: * {border:0;margin:0;padding:0;}
    Seek and you shall find... basically:
    validate your markup | view your page cross-browser/cross-platform | free web tutorials | free hosting


  •  
    Page 1 of 3 123 LastLast

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •