Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 4 of 4
  1. #1
    Regular Coder
    Join Date
    Dec 2006
    Posts
    417
    Thanks
    168
    Thanked 1 Time in 1 Post

    Checking to see if a site is updated

    Is there a way in PHP that you can check to see if a specific website document has been updated? (nonRSS, of course).

    Run a script, the script fget()'s the URL and then does a check to see if the www.foo.com/index.html or /index.asp or index.php (etc) has been updated since the last check?

  • #2
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    Get the page's source code, then sha1()/md5() it and save it.

    Next time you check, do the same. If the sha1/md5 has changed, the source-code has changed... so the site has been modified or updated.

  • Users who have thanked kbluhm for this post:

    Bobafart (01-21-2008)

  • #3
    Regular Coder
    Join Date
    Dec 2006
    Posts
    417
    Thanks
    168
    Thanked 1 Time in 1 Post
    Quote Originally Posted by kbluhm View Post
    Get the page's source code, then sha1()/md5() it and save it.

    Next time you check, do the same. If the sha1/md5 has changed, the source-code has changed... so the site has been modified or updated.
    what do you mean by "Get the page's source code"?

    do you mean using an fget() on the URL?

    ie: fget(http://www.cnn.com) ?

  • #4
    Senior Coder kbluhm's Avatar
    Join Date
    Apr 2007
    Location
    Philadelphia, PA, USA
    Posts
    1,509
    Thanks
    3
    Thanked 258 Times in 254 Posts
    By "get te page's source code"... I mean get the page's source code:
    PHP Code:
    $source file_get_contents$url );
    $source md5$source ); 
    If the client writes anything in real-time to the page, it could wreak havoc on your plans. For instance, if they display the current time, users online, and so on. You may want to grab the source and, using a RegExp, grab a portion of the code you know will only be modified with major updates, but that will severely decrease the scripts ability to just plug in any old site and work as desired.

    What you could also use is stream_get_meta_data():
    PHP Code:
    $fp   fopen$url'r' );
    $data stream_get_meta_data$fp );
    print_r$data['wrapper_data'] );
    fclose$fp ); 
    Will give you something like:
    Code:
    Array
    (
        [0] => HTTP/1.1 200 OK
        [1] => Date: Fri, 18 Jan 2008 15:31:42 GMT
        [2] => Server: Apache/2.2.0 (Fedora)
        [3] => Last-Modified: Sat, 23 Dec 2006 20:05:22 GMT
        [4] => ETag: "9f251c-795f-15998480"
        [5] => Accept-Ranges: bytes
        [6] => Content-Length: 31071
        [7] => Connection: close
        [8] => Content-Type: text/html
    )
    You could then use the Last-Modified header to check when the requested URL was... last modified.

    Actually, if you visit http://www.php.net/stream_get_meta_data, the current top-most comment (by ed at readinged dot com) is probably exactly what you're looking for.
    Last edited by kbluhm; 01-18-2008 at 04:50 PM.

  • Users who have thanked kbluhm for this post:

    Bobafart (01-21-2008)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •