Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 3 123 LastLast
Results 1 to 15 of 32
  1. #1
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts

    problem parsing a series of atom xml files

    Hi, I've managed to parse a few xml feeds without problems, but I'm really struggling with a rather fiddly atom feed one. It's setup as a series of pages each with 20 odd entries. I have no problem parsing the data from each page and I can loop through each successive page without problems.

    20 pages takes 20 seconds to parse, 40 pages takes 40 seconds to parse and so on, But when I try to do more than 60 pages the script seems to choke or stall or generally get so slow that it stops. I can't figure why or what to do now. Surely I'm just reading in a lot of small feeds one after the other so this should be easy.

    My code is below, any tips at all very appreciated, I'm sure you can see from the code I'm rather lo tech so simple is good for me. thank-you

    PHP Code:
    <?php
    session_start
    ();


    include(
    "dbconnect.php");

    //script start point
    list($usec$sec) = explode(' 'microtime());
    $script_start = (float) $sec + (float) $usec;
    //clear the table for the new data we're about to refresh from the xml feed
    mysql_query("delete from FrenchRentals"
    or die(
    mysql_error());

        for ( 
    $n 1$n <= 80$n++) 
        {

            
    $feed_url 'https://partner.homeaway.eu/aggregator/london_prd_en/partners/search?partnerId=cj_world&format=cjAtom&affiliateId=3645474&page='.$n;

            
    $xml_source file_get_contents($feed_url);
            
    $x simplexml_load_string($xml_source);

            if(
    count($x) == 0)
                return;

            foreach(
    $x->entry as $entry)
            {
            
    $Country = (string) $entry->content->listing->data['country'];
            if (
    $Country == "FR")//only French properties
                
    {

            
    $AdTitle= (string) $entry->content->listing->headline;
                    
    $ClassifiedText = (string) $entry->content->listing->description;
                    
    $XmlCode = (string) $entry->content->listing['unitId'];
                    
    $Photo1 = (string) $entry->content->listing['imageUrl'];
            
    $PriceFrom = (string) $entry->content->listing->rates->rate['from'];
            
    $PriceTo = (string) $entry->content->listing->rates->rate['to'];
            
    $RateCurrency = (string) $entry->content->listing->rates->rate['currencyUnit'];
            
    $PropertyType = (string) $entry->content->listing->data['propertyType'];
            
    $Bedrooms = (string) $entry->content->listing->data['bedrooms'];
            
    $Sleeps = (string) $entry->content->listing->data['sleeps'];
            
    $Town = (string) $entry->content->listing->data['city'];
            
    $Region = (string) $entry->content->listing->data['state'];

    IF (
    strlen($AdTitle)>0)
        {
        
    $ClassifiedText mysql_real_escape_string($ClassifiedText);
        
    $AdTitle mysql_real_escape_string($AdTitle);

        
    //$query = "INSERT INTO FrenchRentals 
        //(AdTitle, ClassifiedText, Sleeps, Photo1, PriceFrom, PriceTo)
        //VALUES
        //('$AdTitle', '$ClassifiedText', '$Sleeps', '$Photo1', '$PriceFrom', '$PriceTo')";
        //echo "query is: ".$query."<BR>";
        //mysql_query("$query") 
        //or die(mysql_error()); 
        
          
                    
    $arrayHR[]=array(XmlCode=>$XmlCode,Ref=>$Ref,Region=>$Region,Department=>$Department,Town=>$Town,ClassifiedText=>$ClassifiedText,Photo1=>$Photo1,Price=>$Price,AdTitle=>$AdTitle,Area=>$Area,Land=>$Land,Contact=>$Contact,Name=>$Name,Email=>$Email,Phone=>$Phone,Website=>$Website,SaleRent=>$Category,Bedrooms=>$Bedrooms,Sleeps=>$Sleeps,Category=>$PropertyType,DateJoined=>$DateJoined,PriceFrom=>$PriceFrom,PriceTo=>$PriceTo);
        }
            
            
    //unset variables that might not be reset by next feed
            
    unset($PriceFrom);unset($PriceTo);unset($Sleeps);unset($AdTitle);unset($Photo1);unset($Photo2);unset($Photo3);unset($Land);unset($Website);unset($SaleRent);unset($Category);unset($Region);
            unset(
    $Department);unset($Town);unset($ClassifiedText);unset($Contact);unset($Name);unset($Email);unset($Phone);unset($x);unset($xml_source);unset($feed_url);

            }

        }

        }

    //...........................
    //script end point
    list($usec$sec) = explode(' 'microtime());
    $script_end = (float) $sec + (float) $usec;

    $elapsed_time round($script_end $script_start1);
    echo 
    "elapsed time to run ".$elapsed_time." seconds<BR>";




    ?>

  • #2
    Senior Coder
    Join Date
    Jan 2011
    Location
    Missouri
    Posts
    4,353
    Thanks
    23
    Thanked 618 Times in 617 Posts
    I was going to say that xml related questions go here http://codingforums.com/xml/

    But your problem maybe due to php time out. Most servers are set for 30 secs. You may want to do smaller loops; smaller then 80 so as to avoid that. like for($n = 1; $n <= 20; $n++) .
    Evolution - The non-random survival of random variants.

    "If you leave hydrogen alone, for long enough, it begins to think about itself."

  • #3
    Senior Coder
    Join Date
    Sep 2010
    Posts
    2,089
    Thanks
    15
    Thanked 246 Times in 246 Posts
    The time elapsed script I'm using.
    At the start.

    $mt=microtime(true);

    At the end.

    echo "Execution time: ".round(microtime(true)-$mt , 6 );
    Welcome to http://www.myphotowizard.net

    where you can edit images, make a photo calendar, add text to images, and do much more.


    When you know what you're doing it's called Engineering, when you don't know, it's called Research and Development. And you can always charge more for Research and Development.

  • #4
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by sunfighter View Post
    I was going to say that xml related questions go here http://www.codingforums.com/forumdisplay.php?f=3

    But your problem maybe due to php time out. Most servers are set for 30 secs. You may want to do smaller loops; smaller then 80 so as to avoid that. like for($n = 1; $n <= 20; $n++) .
    Thanks, I did umm and ah but guessed this to be a php issue rather than xml

    i did try setting up a loop inside a loop so the inner loop was doing 20 pages with the outer loop incrementing the inner one to the next 20 but the behavior was just the same, and it's not an xml thing because i can do any 60 pages at will, ie there's nothing sinister about the 60-100 page sequence, just the 80th page

    on what code block does the php timeout count? the entire script only gets 30 seconds to run? or just a block within the script?

    so I could have the script run repeatedly with a different page start every time?

    and I assume the server timeout is my local server and not the remote one?

  • #5
    Senior Coder
    Join Date
    Jan 2011
    Location
    Missouri
    Posts
    4,353
    Thanks
    23
    Thanked 618 Times in 617 Posts
    You can set the timeout for the script.
    PHP Code:
    set_time_limit(0); 
    will allow your script to run forever, but I don't recommend that. it's in seconds so try
    PHP Code:
    set_time_limit(80); 
    and see if that works for you.
    Evolution - The non-random survival of random variants.

    "If you leave hydrogen alone, for long enough, it begins to think about itself."

  • #6
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,092
    Thanks
    2
    Thanked 322 Times in 314 Posts
    If you call set_time_limit(x) inside your loop, it restarts the timeout counter. So, the timeout value will only affect the execution of your loop.
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #7
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for both those tips gents, they certainly make sense, trying it now and will report back

  • #8
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts
    hmpf, it does appear as if the script just hangs for ever at 80 seconds, I've dickered with the limit on the loop and i get th script to run to 79.5 second but no more

    I thought aha and set the timeout to 400 but no change

    So this presumably means there's something else at server level setting an 80s timeout?

    I'm surprised at how slow this script is, I actually have 2,500 pages to parse and at the moment I can't do 80. I'm happy to run this one a night in the middle of the night but is there another way to tackle this, perhaps i can simply retrieve the info and save it locally first, I assume the parsing would be lightening fast and the delay is in loading the remote page, in which case this is always going to be a problem

  • #9
    Senior Coder
    Join Date
    Sep 2010
    Posts
    2,089
    Thanks
    15
    Thanked 246 Times in 246 Posts
    If this is a French script that's causing problems, it probably has accented characters, they could be the trouble. Let us know if that is the case.
    Welcome to http://www.myphotowizard.net

    where you can edit images, make a photo calendar, add text to images, and do much more.


    When you know what you're doing it's called Engineering, when you don't know, it's called Research and Development. And you can always charge more for Research and Development.

  • #10
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts
    ok, by working backwards and removing the code successively the line that that generates the delay or choke or hang is

    PHP Code:
     $xml_source file_get_contents($feed_url); 
    umm

    ideas welcome

  • #11
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by DrDOS View Post
    If this is a French script that's causing problems, it probably has accented characters, they could be the trouble. Let us know if that is the case.
    I've dealt with that problem on this site and I'm happy with that problem, I think

    I can load any block of 50 odd pages I want to, it seems to be to be the quantity of data that is somehow causing the problem, I see accents in the data I do manage to parse, sometime chewed up but they come through

  • #12
    Senior Coder
    Join Date
    Aug 2006
    Posts
    1,311
    Thanks
    11
    Thanked 285 Times in 284 Posts
    Quote Originally Posted by Tynan View Post
    I'm surprised at how slow this script is, I actually have 2,500 pages to parse and at the moment I can't do 80. I'm happy to run this one a night in the middle of the night but is there another way to tackle this, perhaps i can simply retrieve the info and save it locally first, I assume the parsing would be lightening fast and the delay is in loading the remote page, in which case this is always going to be a problem
    You're correct here, the parsing is going to be lightning fast as compared to the http request. What you'd like is to be able to request all the data at once, rather than having to make 2500 individual requests. Maybe there's a different url syntax that can be used to get this?

    FWIW, I ran your script up to 120 requests, and all of them come in about 1.5 seconds apart. No 80s timeout for me, though I'm not on your server. But that means the 80s issue is likely *your* server, not the remote server.

    Dave

  • #13
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by tracknut View Post
    You're correct here, the parsing is going to be lightning fast as compared to the http request. What you'd like is to be able to request all the data at once, rather than having to make 2500 individual requests. Maybe there's a different url syntax that can be used to get this?

    FWIW, I ran your script up to 120 requests, and all of them come in about 1.5 seconds apart. No 80s timeout for me, though I'm not on your server. But that means the 80s issue is likely *your* server, not the remote server.

    Dave
    Thank-you for that, that makes sense to , I'd love to load it all at once, their own notes say load it a page at a time, it makes no sense so me and it doesn't help that they stick the whole bloody world into a single feed.

    I agree this is a local timeout issue, a new one for me but I suppose I've never had an 80s script before

    Is there some elegant way of running the script over and over, this is an end of the day job for me when I'm frankly dulled

    As soon as there's a single script trying to run this or tie it togther it's going to timeout isn't it?

    If I had hundreds of copies of the script each loading 50 pages all tied togwther in a CRON job(s), that would work around this limitation wouldn't it?

    It just seems ridiculous

  • #14
    Senior Coder
    Join Date
    Aug 2006
    Posts
    1,311
    Thanks
    11
    Thanked 285 Times in 284 Posts
    Have you called phpinfo() to see what your timeout value is, see if it actually is 80? Also see whether you're running in safe mode.

  • #15
    Regular Coder
    Join Date
    Oct 2004
    Location
    London E4 UK
    Posts
    320
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by tracknut View Post
    Have you called phpinfo() to see what your timeout value is, see if it actually is 80? Also see whether you're running in safe mode.
    phpinfo, nothing useful there is there?

    PHP Version 5.3.25

    System Linux hp26.hostpapa.com 2.6.18-348.12.1.el5 #1 SMP Wed Jul 10 05:28:41 EDT 2013 x86_64
    Build Date Jun 4 2013 23:31:50
    Configure Command './configure' '--disable-fileinfo' '--disable-phar' '--enable-bcmath' '--enable-calendar' '--enable-exif' '--enable-ftp' '--enable-gd-native-ttf' '--enable-libxml' '--enable-magic-quotes' '--enable-mbstring' '--enable-pdo=shared' '--enable-zip' '--prefix=/usr' '--with-curl=/opt/curlssl/' '--with-curlwrappers' '--with-freetype-dir=/usr' '--with-gd' '--with-gettext' '--with-imap=/opt/php_with_imap_client/' '--with-imap-ssl=/usr' '--with-jpeg-dir=/usr' '--with-kerberos' '--with-libdir=lib64' '--with-libexpat-dir=/usr' '--with-libxml-dir=/opt/xml2/' '--with-mcrypt=/opt/libmcrypt/' '--with-mysql=/usr' '--with-mysql-sock=/var/lib/mysql/mysql.sock' '--with-mysqli=/usr/bin/mysql_config' '--with-openssl=/usr' '--with-openssl-dir=/usr' '--with-pcre-regex=/opt/pcre' '--with-pdo-mysql=shared' '--with-pdo-sqlite=shared' '--with-pic' '--with-png-dir=/usr' '--with-pspell' '--with-sqlite=shared' '--with-tidy=/opt/tidy/' '--with-xmlrpc' '--with-xpm-dir=/usr' '--with-xsl=/opt/xslt/' '--with-zlib' '--with-zlib-dir=/usr'
    Server API CGI/FastCGI
    Virtual Directory Support disabled
    Configuration File (php.ini) Path /usr/lib
    Loaded Configuration File /usr/local/lib/php.ini
    Scan this dir for additional .ini files (none)
    Additional .ini files parsed (none)
    PHP API 20090626
    PHP Extension 20090626
    Zend Extension 220090626
    Zend Extension Build API220090626,NTS
    PHP Extension Build API20090626,NTS
    Debug Build no
    Thread Safety disabled
    Zend Memory Manager enabled
    Zend Multibyte Support disabled
    IPv6 Support enabled
    Registered PHP Streams compress.zlib, dict, ftp, ftps, gopher, http, https, imap, imaps, pop3, pop3s, rtsp, smtp, smtps, telnet, tftp, php, file, glob, data, zip
    Registered Stream Socket Transports tcp, udp, unix, udg, ssl, sslv3, sslv2, tls
    Registered Stream Filters zlib.*, convert.iconv.*, mcrypt.*, mdecrypt.*, string.rot13, string.toupper, string.tolower, string.strip_tags, convert.*, consumed, dechunk

    Zend logo This program makes use of the Zend Scripting Language Engine:
    Zend Engine v2.3.0, Copyright (c) 1998-2013 Zend Technologies
    with the ionCube PHP Loader v4.2.2, Copyright (c) 2002-2012, by ionCube Ltd., and
    with Zend Guard Loader v3.3, Copyright (c) 1998-2010, by Zend Technologies
    with Suhosin v0.9.33, Copyright (c) 2007-2012, by SektionEins GmbH


  •  
    Page 1 of 3 123 LastLast

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •