Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    New Coder
    Join Date
    Jul 2008
    Posts
    57
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Take data from website using PHP

    hi,

    i write the below code to capture the images from the website when i submit the url.In the same way i want to capture the Text information from the website.plz tell that whats the code for that.plz help me.

    PHP Code:
    <?php
     
    $content
    file_get_contents($url); 
    preg_match_all"/<img(.*)src=(\"|')(.*)(\"|\')(.*)[\/]?>/siU"$content$matchPREG_PATTERN_ORDER);

    echo 
    "<b>Capture Images :</b><br>";
    echo 
    "<br>";
    print_r($match[0]);
    echo 
    "<br>";
    echo 
    "<br>";
    echo 
    "<b>Capture Images URLS :</b><br><br>";
    preg_match_all"/<img(.*)src=(\"|')(.*)(\"|\')(.*)[\/]?>/siU"$content$matchPREG_PATTERN_ORDER);
    print_r($match[3]);
    ?>

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,994
    Thanks
    4
    Thanked 2,662 Times in 2,631 Posts
    Capturing text is far more difficult than capturing an image. The problem is consistency, we cannot be certain if text has been stored in p, span, and li tags. Unless they have a declared doctype, there is no guarantee that text will be placed in the proper locations.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 
    Been gone for a few months, and haven't programmed in that long of a time. Meh, I'll wing it ;)

  • #3
    Regular Coder BWiz's Avatar
    Join Date
    Mar 2006
    Location
    Sol System
    Posts
    471
    Thanks
    7
    Thanked 21 Times in 21 Posts
    You could try searching for specific tags (such as p,h1,span) and calculating the length of characters between the opening tag and the closing tag (strlen). With this number you could extract the characters inbetween the tags (strstr, I think). Loop through the entire document looking for these tags and extracting them.
    BWiz :: Happy Coding!
    2006
    2007 2008 2009
    2010 2011
    Irrational numbers make no sense.

  • #4
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,513
    Thanks
    8
    Thanked 1,090 Times in 1,081 Posts
    Just out of curiosity, what is the website you want to parse?

  • #5
    Regular Coder
    Join Date
    Jun 2002
    Location
    Adirondacks
    Posts
    516
    Thanks
    4
    Thanked 4 Times in 4 Posts
    google "spiderring web pages" or "data mining"
    there's a number of scripts out there you could use


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •