Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    New to the CF scene
    Join Date
    Mar 2007
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Pulling information from the source of another site

    Hi, what I am trying to do is pull information from the Generated source code of this site: http://armory.worldofwarcraft.com/#c...7dan&n=Palaran
    what I want to do with it is then manipulate it withing my own code to show certain elements which were pulled from the source of the previous site. I am somewhat experienced with php and would like to use it for the development. I am sorry if I didn't explain my issue well enough, just let me know. Oh and if I am just asking a question already answered, please just let me know how to find it, I tried searching for the answer but couldn't find it.

    Thanks in advance

  • #2
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,537
    Thanks
    8
    Thanked 1,093 Times in 1,084 Posts
    You'll be looking at something similar to below ... where you open the file and begin parsing-out various things between tags. The example is part of an RSS feed generator, but the idea is the same ...

    PHP Code:
    <?php

    // Get page
    $url "http://armory.worldofwarcraft.com/#character-sheet.xml?r=Gul%27dan&n=Palaran";
    $data implode(""file($url)); 

    // Get content items between <html> and </html>
    preg_match_all ("/<html>([^`]*?)<\/html>/"$data$matches);

    // Loop through each item
    foreach ($matches[0] as $match) {
    // Get title
    preg_match ("/<title>([^`]*?)<\/title>/"$match$temp);
    $title $temp['1'];
    $title strip_tags($title);
    $title trim($title);

    // Get an item in the <h4> header area of the page
    preg_match ("/<h4>([^`]*?)<\/h4>/"$match$temp);
    $date $temp['1'];
    $date trim($date);

    // Get some text between <p> and </p>
    preg_match ("/<p>([^`]*?)<\/p>/"$match$temp);
    $text $temp['1'];
    $text trim($text);

    }

    // output the things you found
    echo "Title: $title <br>\n";

    ?>

  • #3
    Senior Coder
    Join Date
    Jan 2007
    Posts
    1,648
    Thanks
    1
    Thanked 58 Times in 54 Posts
    Remember to mention on the page that you are grabbing data from that website.

    Stealing content is bad. Borrowing is okay

  • #4
    New to the CF scene
    Join Date
    Mar 2007
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    thanks for the help, it is exactly what i was trying to do, or atleast a start, problem is I am having a hard time figuring out how to find the element I want. I noticed that when I view the generated source in firefox i find that one of the variables i want from there, say his level, is in the source as
    Code:
    ...
    var theClassId = 2;
    var theRaceId = 10;
    var theClassName = "Paladin";
    var theLevel = 68;
    var theCharUrl = "r=Gul%27dan&n=Palaran";
    ...
    I want to pull this info from there to my site. Any ideas? lol sorry for the questions.

    Thanks in advance again.

  • #5
    New to the CF scene
    Join Date
    Mar 2007
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Here's my latest attempt, I can't figure out how to get it to find that text. Maybe someone can show me where I am making the mistake. Thanks again.

    PHP Code:
    <?php

    // Get page
    $url "http://armory.worldofwarcraft.com/#character-sheet.xml?r=Gul%27dan&n=Palaran";
    $data implode(""file($url)); 

    // Get content items between <html> and </html>
    preg_match_all ("/<html>([^`]*?)<\/html>/"$data$matches);

    // Loop through each item
    foreach ($matches[0] as $match) {
    // Get title
    preg_match ("/<title>([^`]*?)<\/title>/"$match$temp);
    $title $temp['1'];
    $title strip_tags($title);
    $title trim($title);

    // Get some text for the classid
    preg_match ("/var theClassId = ([^`]*?);/"$match$temp);
    $classid $temp['1'];
    $classid trim($classid);

    }

    // output the things you found
    echo "Title: $title <br>\n";
    echo 
    "Class ID: $classid <br>\n";

    ?>

  • #6
    Master Coder
    Join Date
    Jun 2003
    Location
    Cottage Grove, Minnesota
    Posts
    9,537
    Thanks
    8
    Thanked 1,093 Times in 1,084 Posts
    I think I might see something that's a problem ...

    See the sample code below. If you run that and view all of the
    matching (the whole thing) ... the part you're looking for does not
    exist in the match. This could be because they are using Javascript
    to populate the portion of the page you're looking for.

    Grabbing the URL and parsing the HTML won't see the Javascripting.

    PHP Code:
    <?php

    // Get page
    $url "http://armory.worldofwarcraft.com/#character-sheet.xml?r=Gul%27dan&n=Palaran";
    $data implode(""file($url)); 

    // Get content items between <html> and </html>
    preg_match_all ("/<html>([^`]*?)<\/html>/"$data$matches);

    foreach (
    $matches[0] as $match) { 
    // don't match, just grab everything
    }

    // output the whole match
    echo $match;

    ?>
    Now, if they were to generate an RSS feed ... which would be a great idea on their part ... you would have everything you need. But many sites either don't know how, or feel it allows people to steal information. I disagree with the latter, because RSS Feeds generate interest to people who would normally not see your site, and therefore, draws more visitors to your site.
    Last edited by mlseim; 03-20-2007 at 06:00 PM.

  • #7
    New to the CF scene
    Join Date
    Mar 2007
    Posts
    7
    Thanks
    0
    Thanked 0 Times in 0 Posts
    very interesting, well i guess the final answer then is, because everything I want is either part of javascript or is written to the page via javascript with the write command I can't pull the info. Thanks for all the help, unless you have a cure for that issue, then I will call you God. lol

  • #8
    New Coder
    Join Date
    May 2005
    Location
    Leeds, UK
    Posts
    83
    Thanks
    1
    Thanked 0 Times in 0 Posts
    just jumping in here. If the values you want are in the javascript, there's 2 things you could do:
    1. when doing your pattern matching, just match stuff between <script> tags. That should make things easier for you
    2. Just grab the url (as you are doing), then simply add a bit of javascript to the end to either write the required variables to screen (via a document.write or alert()) or redirect to a php page, passing the values of the variables in the url eg window.location="output.php?theLevel="+theLevel+"&theRaceId="+theRaceId

  • #9
    Senior Coder
    Join Date
    Jan 2007
    Posts
    1,648
    Thanks
    1
    Thanked 58 Times in 54 Posts
    var theClassId = 2;
    var theRaceId = 10;
    var theClassName = "Paladin";
    var theLevel = 68;
    var theCharUrl = "r=Gul%27dan&n=Palaran";
    There should be no problem grabbing this information.

    Just have a regular expression for this pattern: var * = *; (The * indicating any character). Make sure that you make it lazy, and not greedy, otherwise you will get:

    Code:
    theRaceId = 10;
    var theClassName = "Paladin";
    var theLevel = 68;
    var theCharUrl
    As the first variable name :P


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •