Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    New to the CF scene
    Join Date
    Nov 2011
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Parse/Scrape table information from another site

    Hello,

    I wont go into why or what for, but I need to grab data from a website that I can input into a MySQL table for me to do whatever I need with it.

    I Google'd around, and found that you can do it using Simple HTML DOM? The problem is, no matter what I try, nothing seems to work.

    Code:
    include "simple/simple_html_dom.php";
    	
    $html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
    $es = $html->find('data standings League');
    
    echo($es);
    This is outputting "Array".

    I realize this isn't much for code, but I haven't gotten anywhere with tutorials I found on the net. The table's class is "data standings league".

    Basically, I want to grab the data in the NHL.com standings for the league. Then I'd like to put that info into my own MySQL table. Ideally, I'd like to automate this process everyday (although this can wait).

    Any help is appreciated!
    Last edited by jsquadrilla; 11-16-2011 at 06:01 PM.

  • #2
    Senior Coder
    Join Date
    Feb 2011
    Location
    Your Monitor
    Posts
    4,479
    Thanks
    63
    Thanked 538 Times in 525 Posts
    Quote Originally Posted by jsquadrilla View Post
    This is outputting "Array".
    Then you need to var_dump() the result and see what the array contains. The function is probably returning several things in an array including the information you want.
    I can't really think of anything to write here now...

  • #3
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,549
    Thanks
    8
    Thanked 1,095 Times in 1,086 Posts
    wow ... that is going to be a tough one.
    I know you want to do this for free, but find out how much it would
    cost to get an account with them, and possible ... they may have an
    API or XML file that contains the data. Even if it costs some money
    each month, it might be worth it. At least ask them about it.

    Parsing their HTML is not going to be easy, but technically, it could be
    done with enough scripting and enough time to debug it.


    .

  • #4
    New to the CF scene
    Join Date
    Nov 2011
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by tangoforce View Post
    Then you need to var_dump() the result and see what the array contains. The function is probably returning several things in an array including the information you want.
    Now it says "array(0) { } Array"

    Quote Originally Posted by mlseim View Post
    wow ... that is going to be a tough one.
    I know you want to do this for free, but find out how much it would
    cost to get an account with them, and possible ... they may have an
    API or XML file that contains the data. Even if it costs some money
    each month, it might be worth it. At least ask them about it.

    Parsing their HTML is not going to be easy, but technically, it could be
    done with enough scripting and enough time to debug it.


    .
    Can't be that hard? I mean, I know barely nothing about scraping, but if the data is constant (table name stays the same etc.) it shouldn't be too difficult to figure out. Sadly, Google Docs does exactly what I want with the "importhtml" function. It puts it in a spreadsheet perfectly, so there has to be a way (if only I knew how importhtml functioned...)

  • #5
    New to the CF scene
    Join Date
    Nov 2011
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Minor success!

    Code:
    function get_data($url)
    {
      $ch = curl_init();
      $timeout = 5;
      curl_setopt($ch,CURLOPT_URL,$url);
      curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
      curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
      $data = curl_exec($ch);
      curl_close($ch);
      return $data;
    }
    
    $returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
    
    echo $returned_content;
    When I run this, I get the exact page, but on my URL which is perfect. Now I just need to find a way to extract what I need from this page, which is the next step and where I'm completely lost.

  • #6
    Regular Coder
    Join Date
    Jul 2010
    Location
    Oregon City
    Posts
    280
    Thanks
    5
    Thanked 50 Times in 49 Posts
    Quote Originally Posted by jsquadrilla View Post
    Minor success!

    Code:
    function get_data($url)
    {
      $ch = curl_init();
      $timeout = 5;
      curl_setopt($ch,CURLOPT_URL,$url);
      curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
      curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,$timeout);
      $data = curl_exec($ch);
      curl_close($ch);
      return $data;
    }
    
    $returned_content = get_data('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
    
    echo $returned_content;
    When I run this, I get the exact page, but on my URL which is perfect. Now I just need to find a way to extract what I need from this page, which is the next step and where I'm completely lost.

    regular expressions. have fun with that!

  • #7
    New to the CF scene
    Join Date
    Nov 2011
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Adee View Post
    regular expressions. have fun with that!
    Any idea where I can start?

    Using Simple HTML Dom, my code is only:

    Code:
    $html = file_get_html('http://www.nhl.com/ice/standings.htm?season=20112012&type=LEA');
    	$e = $html->find("table", 2);	
    	echo $e;
    It displays the table, but still has href's, classes etc. I want to strip it all and just have the data. Just looking for a kick in the right direction for how to do that.
    Last edited by jsquadrilla; 11-16-2011 at 09:22 PM.

  • #8
    Regular Coder
    Join Date
    Jul 2010
    Location
    Oregon City
    Posts
    280
    Thanks
    5
    Thanked 50 Times in 49 Posts
    Is this what you're trying to do?
    http://rs-downfall.com/scripts/cf/nhl.php

  • #9
    New to the CF scene
    Join Date
    Nov 2011
    Posts
    8
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Adee View Post
    Is this what you're trying to do?
    http://rs-downfall.com/scripts/cf/nhl.php
    That's what I've got so far (minus the weird  that shows up on yours)

    Basically, I want to put that information, as-is, into a database.

    But, I believe I'd need to strip it down before I can.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •