Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 14 of 14
  1. #1
    New to the CF scene
    Join Date
    Sep 2008
    Posts
    9
    Thanks
    2
    Thanked 0 Times in 0 Posts

    Need to create a bot

    Hi all,

    Sorry if this is in the wrong section but i know basic php so if possible, would like to make the script i need in this language.

    We have a program which is being used by a number of websites. It's basically just a link through to our site but each one is different. However it will always contain 'mysite.com'.

    I have a list of domain names (approx 7'000) and i need to check if any of these contain the link to our site.

    Is there a way i can make this in Php or can someone reccommend how i do this?

    Many thanks!

    Matt

  • #2
    Regular Coder
    Join Date
    Dec 2009
    Location
    UK
    Posts
    495
    Thanks
    0
    Thanked 58 Times in 58 Posts
    Sure you can. You just need to load each site individually, check for the URL, and save the report to either a file, the screen or a database. With that many domains you will need to increase the default time limit the script can run or it will stop short (more than likely) of your 7000 domains
    My site: JayGilford.com
    Resources:
    PHP Pagination Class | Getting all page links | Handling PHP Errors properly
    If you like a users help, show your appreciation with the rep and thanks buttons :)

  • #3
    New to the CF scene
    Join Date
    Sep 2008
    Posts
    9
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Hi,

    Thanks for your response. Any idea how i would even start a script to open each page and search for the link?

    Sorry... like i say, limited on php knowledge.

    Thanks!

    Matt

  • #4
    Regular Coder
    Join Date
    Dec 2009
    Location
    UK
    Posts
    495
    Thanks
    0
    Thanked 58 Times in 58 Posts
    to load page data -> file_get_contents() or fopen(), fread(), fclose()
    looping -> foreach
    My site: JayGilford.com
    Resources:
    PHP Pagination Class | Getting all page links | Handling PHP Errors properly
    If you like a users help, show your appreciation with the rep and thanks buttons :)

  • Users who have thanked JAY6390 for this post:

    mattcuckston (03-05-2010)

  • #5
    New to the CF scene
    Join Date
    Sep 2008
    Posts
    9
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Thats great - thanks for your help. I'll give this a try and if i encounter any problems, i'm sure someone will be able to help.

    Thanks!

    Matt

  • #6
    New to the CF scene
    Join Date
    Sep 2008
    Posts
    9
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Hi,

    I'm wondering if someone can help me. Jay was very kind to give me some references but i think this is to advanced for me.

    Can someone point me in the right direction of how this script would work exactly?

    Many thanks!

  • #7
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,513
    Thanks
    8
    Thanked 1,090 Times in 1,081 Posts
    Will your link aways appear on the main page of those 7000 sites?
    To crawl through each site (recursive - through all pages) would take a huge
    amount of server power and time.

  • #8
    New to the CF scene
    Join Date
    Sep 2008
    Posts
    9
    Thanks
    2
    Thanked 0 Times in 0 Posts
    99% of them yes will apear on the main front page so i dont need it to crawl as i dont have to worry about that margin of error.

    Many thanks!

    Matt

  • #9
    New Coder
    Join Date
    Feb 2010
    Location
    New Zealand
    Posts
    76
    Thanks
    7
    Thanked 10 Times in 9 Posts
    PHP Code:
    <?php
    $recip 
    'http://www.domain.com'// this is the reciprocal url... that EXACTLY must match
    $filename 'links1.txt'//File with sites where your link is suppose to be 1 per line
    $found 0;
    $notfound 0;
    function 
    backlinkCheck($siteurl$recip) {   
    if (
    $arrText file($siteurl)){        
    for (
    $i=0$i<count($arrText); $i++) {            
    $text $text $arrText[$i];        
    }        
    if (
    eregi($recip$text)) {           
    fclose ($fd);        
    } else {            
    return 
    false// set false if cklinbak is missing           
    fclose ($fd);            
    }        
    }    
    return 
    false;
    }
    echo 
    '<h2>Link Checker</h2>';
    echo 
    '<p> This will check if the text '.$recip .' is found on the webpages</p><hr>';        
    $file_contents=file($filename);        
    for ( 
    $i=0$i sizeof($file_contents); $i++) {            
    $line = ($file_contents[$i]);            
    $line trim($line);            
    $siteurl=$line;       
    if (
    backlinkCheck($siteurl$recip)) {            
    echo 
    '<p>Backlink was <b>FOUND</b> on: '.$siteurl."</p>\n\n";            
    $found++;        
    } else {            
    echo 
    '<p>Backlink was <b>NOT FOUND</b> on: '.$siteurl."</p>\n\n";            
    $notfound++;        
    }
    }
    echo 
    'Total Found '.$found .'<br>';
    echo 
    'Total Not Found '.$notfound .'<br>';
    echo 
    'Total Links Checked '.($notfound+$found).'<br>';
    echo 
    'Total Not Found '.$notfound .'<br>';
    echo 
    'Total Links Checked '.($notfound+$found).'<br>';
    ?>
    Last edited by Azzaboi; 03-10-2010 at 10:54 PM.

  • #10
    New to the CF scene
    Join Date
    Sep 2008
    Posts
    9
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Hi,

    Thats amazing, thank you. I have just tried it however and i get some errors appearing.

    It comes up with

    Code:
    Warning: file() [function.file]: URL file-access is disabled in the server configuration in /home/accounts/public_html/test1.php on line 15
    
    Warning: file(http://www.booking.com) [function.file]: failed to open stream: no suitable wrapper could be found in /home/accounts/public_html/test1.php on line 15
    On line 15 i have
    Code:
    if ($arrText = file($siteurl)){
    Any ideas?

    Thanks!

  • #11
    New Coder
    Join Date
    Feb 2010
    Location
    New Zealand
    Posts
    76
    Thanks
    7
    Thanked 10 Times in 9 Posts
    URL file-access is disabled in the server configuration
    It's in your php.ini, but I don't recommend changing it.

    Any upgrades past PHP 4 will turn allow_url_fopen to OFF as default due to security concerns. This is most prevalent in cross-site scripting attacks, or XSS attacks. In some cases, malicious users have even enslaved a server to become a spam-email-sending nightmare: all without the administrator noticing.

    Try use Relative File Paths instead and cut out the domain name all together?

    The function 'eregi' might of been deprecated for 'preg_match'?

  • #12
    New to the CF scene
    Join Date
    Sep 2008
    Posts
    9
    Thanks
    2
    Thanked 0 Times in 0 Posts
    Hi,

    Okay - i've temperarly activated the "allow_url_fopen" however it still doesnt seem to be working.

    I now get an error reading
    Code:
    Warning: fclose(): supplied argument is not a valid stream resource in /home/accounts/public_html/test1.php on line 20
    I just did a simple one searching for http://news.bbc.co.uk on http://news.bbc.co.uk but it says it found no references.

    Thanks in advance!

  • #13
    New Coder
    Join Date
    Feb 2010
    Location
    New Zealand
    Posts
    76
    Thanks
    7
    Thanked 10 Times in 9 Posts
    For the backlinkCheck function:

    PHP Code:
    if (eregi($recip$text)) {            
    return 
    true;  
    } else {             
    return 
    false// set false if cklinbak is missing                       

    Im not the best php coder, still new. What I was trying to do was use fclose to close an open file pointer before returning from the function. Maybe someone else can provide more advance coding, it was just a quick example to get you started.

  • #14
    Supreme Master coder! _Aerospace_Eng_'s Avatar
    Join Date
    Dec 2004
    Location
    In a place far, far away...
    Posts
    19,291
    Thanks
    2
    Thanked 1,043 Times in 1,019 Posts
    For more compatibility among servers and without having to change the ini file you should probably use curl. It will do kind of the same thing as file_get_contents. There are many examples of how to use curl out there. You just need to search.
    ||||If you are getting paid to do a job, don't ask for help on it!||||


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •