Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    Regular Coder
    Join Date
    Jan 2011
    Posts
    120
    Thanks
    6
    Thanked 2 Times in 2 Posts

    PHP link scraping

    Hi guys! I'm trying to turn a page filled with a giant table of links into an array that I can use to check the links for validity. I realize there are better ways to do this. It's more of a learning process than anything. However, when using the code below that I've been trying to edit, it's giving no results. Is there any noticable reasons as to why it's not giving me the desired result?

    Thanks in advance!!

    Matt

    Here is a sample of a row from the table I am trying to scrape.
    Code:
    <tr> 
    <td>1</td><td>The Hangover 2</td><td>http://www.novamov.com/video/kcyzc7aoduw12</td><td>http://www.putlocker.com/file/F72561F9414120CA</td><td>http://www.putlocker.com/file/24D2A737D555C0D9</td><td>http://www.putlocker.com/file/98592CE881B32D29</td><td>http://www.sockshare.com/file/1BE3ED2D67C9918E</td></tr>

    And here is my code that is returning 0 results:
    PHP Code:

    <?php
    // get the HTML
    $html file_get_contents("choosing to hide url here");

    preg_match_all(
        
    '/<tr> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <td>(.*?)<\/td> 
    <\/tr>/s'
    ,
        
    $html,
        
    $posts,
        
    PREG_SET_ORDER // formats data into an array of posts
    );
    $num_records = @mysql_num_rows($posts);

    foreach (
    $posts as $post) {
        
    $movie_id $post[1];
        
    $title $post[2];
        
    $version1 $post[3];
        
    $version2 $post[4];
        
    $version3 $post[5];
        
    $version4 $post[6];
        
    $version5 $post[7];
    }

    if (
    $num_records 1) {
    print 
    "No results"
    } else {
    echo 
    $posts;
    };
    ?>
    Last edited by MattClark; 09-13-2011 at 09:13 AM.

  • #2
    Regular Coder
    Join Date
    May 2011
    Posts
    241
    Thanks
    1
    Thanked 57 Times in 56 Posts
    You should use count, not mysql_num_rows, when need to count array elements.

    Try the following code

    PHP Code:
    $html '<tr> 
    <td>1</td><td>The Hangover 2</td><td>http://www.novamov.com/video/kcyzc7aoduw12</td><td>http://www.putlocker.com/file/F72561F9414120CA</td><td>http://www.putlocker.com/file/24D2A737D555C0D9</td><td>http://www.putlocker.com/file/98592CE881B32D29</td><td>http://www.sockshare.com/file/1BE3ED2D67C9918E</td></tr>'
    ;

    $pattern '#<tr>\s*<td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td><td>(.*?)</td></tr>#si';
    if (
    preg_match_all($pattern$html$postsPREG_SET_ORDER))
    {
            foreach(
    $posts as $post)
            {
                    
    print_r($post);
                    
    $movie_id $post[1];
                    
    $title $post[2];
                    
    $version1 $post[3];
                    
    $version2 $post[4];
                    
    $version3 $post[5];
                    
    $version4 $post[6];
                    
    $version5 $post[7];
            }


  • #3
    Regular Coder
    Join Date
    Jan 2011
    Posts
    120
    Thanks
    6
    Thanked 2 Times in 2 Posts
    I'm slightly confused. Each of the two codes are doing the same thing...they're pulling every link from a text file that contains every link on my site. What I'm ultimately trying to do is get the page content of each individual link and make sure that the movie player is still embedded on the pages that those links go to.

    When I do it, it's getting the page content of every link, but it's putting it all onto the same page, so that I can't html scrape each of them individually. I'm guessing i'm supposed to remove them from the array? But I'm not entirely sure how.

  • #4
    Regular Coder
    Join Date
    Jan 2011
    Posts
    120
    Thanks
    6
    Thanked 2 Times in 2 Posts
    bump. if anyone knows what I would do to fix this problem, I would appreciate it greatly!


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •