Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    New Coder
    Join Date
    Mar 2010
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Parse XML - Write to a database --- PLEASE HELP!

    This will kill me, it is taking my whole work day - each day, checking for errors so my visitors don't notice...

    BACKGROUND:
    I have a financial website that display's Stock Market Data from my country's stock exchange.
    Stock Market Data is read from an XML file and written to a database.
    Data is refreshed every ten minutes (around 40 times per day), for 200+ stocks.

    PROBLEM:
    Most of the time everything works fine, but, several times each day, during a refresh, a data for a random stock, usually only one, is read only partialy.
    (for example. instead of HighPrice=320.99, to a database is written HighPrice=99, or, instead of DailyLow=4270.01, database contains DailyLow=70.01 )

    PROBLEM SCRIPTS:
    XML.class.php - parses XML data and places it in a structured PHP array
    ZSE.class.php - reads the data and writes it to a database. (I suspect function getMarketTradingData could be an issue (on lines 167-209), everything above just writes to a database)

    Potentially important facts:
    • XML is allways clean of errors.
    • Time of data collection(refresh) is not important, sometimes the error appears at 10:25, sometimes at 12:35, sometimes at 15:45, etc...
    • Usually the error is limitied to only one stock out of 200+, rarely two or three of them have errors.
    • Data is collected for several fields - but only one has an error: High Price, Low Price, Closing Price, Buy Price, Sell Price, Volume, Average Price. But every time the data is read partially it is limited to only one field. For example, sometimes the Closing price has an error (to a database is written 16 instead of 316), sometimes some other price like daily low price or average price.
    • Dellimiter is not an issue, the data is read partially independent of it. For e.g. sometimes instead of Average Price=24.49 to a database is written only the decimal part 49, sometimes instead of SellPrice=3420 database contains SellPrice=20.


    Example:
    Attached Files Attached Files

  • #2
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    I don't know what database you're using, but if it and the tables support transactions, you should consider implementing them, if you're not already. Then you can parse-and-insert an usual, but inside a transaction. Before commit, run a comparison between the XML and inserted data. If match, commit the transaction.

    If you don't have transactions available, there are probably other ways to do something similar.

    I'm not sure what other options are available, but you apparently need to implement some sort of error-checking since you can't trust your code.
    Are you a Help Vampire?

  • #3
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    Some other thoughts after looking at the code:

    How large is the XML file? Does it significantly vary in size between updates? If variable size, can you find a pattern between the corrupted data and the byte location in the file?

    Is the data always consistently formatted? There are never stray characters or spaces in these problem areas?

    The getMarketTradingData function seems to pull data without attempting test or format any of it. If you expect a float or integer, you can add extra error-checking with functions like is_float or is_int.
    Are you a Help Vampire?

  • Users who have thanked tomws for this post:

    maros174 (03-04-2010)

  • #4
    New Coder
    Join Date
    Mar 2010
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    tomws,
    thanks for looking at the code... every little bit helps. This thing is killing me.

    Quote Originally Posted by tomws View Post
    I don't know what database you're using, but if it and the tables support transactions, you should consider implementing them, if you're not already...
    it's MySQL. I don't know if transactions are implemented or even supported.

    Web development company made the site for me, since then, because of the recession, they've gone bust, and now I'm left to my own knowledge which is very limited.

    Quote Originally Posted by tomws View Post
    How large is the XML file? Does it significantly vary in size between updates? If variable size, can you find a pattern between the corrupted data and the byte location in the file?
    It's around 130 KB. I think it's more or less the same size.

    It is online: (http://webservice.zse.hr/delayed/Del...MarketData.xml),
    and it is refreshed during stock market working hours 10h-16h


    Quote Originally Posted by tomws View Post
    Is the data always consistently formatted? There are never stray characters or spaces in these problem areas?
    No, it's perfectly clean every time. I checked this.

    Nothing distinguishes the area from which only the partial number is "pulled", from thousands of other exactly the same areas from which the number is pulled correctly.

    I took copies of the file on several occasions when the parser made the error. I can upload them if they can be helpful.

    Quote Originally Posted by tomws View Post
    The getMarketTradingData function seems to pull data without attempting test or format any of it. If you expect a float or integer, you can add extra error-checking with functions like is_float or is_int.
    I don't think it would help. The expected number can be float or an integer, both of which is correct.
    The data that is only partially parsed from an xml and wrongly written to the database looks something like this:
    157 instead of 2157, or 99 instead of 211.99, or 5.33 instead of 445.33
    Last edited by maros174; 03-02-2010 at 07:21 PM.

  • #5
    Senior Coder tomws's Avatar
    Join Date
    Nov 2007
    Location
    Arkansas
    Posts
    2,644
    Thanks
    29
    Thanked 330 Times in 326 Posts
    MySQL can use transactions in InnoDB tables, but not MyISAM. (I think that's still the case.)

    Past that, I think I'm not skilled enough to diagnose the problem. The only things I could offer would be adding extra layers of data integrity testing. Perhaps someone with more large-scale experience has some ideas on this.
    Are you a Help Vampire?

  • #6
    New Coder
    Join Date
    Mar 2010
    Posts
    10
    Thanks
    1
    Thanked 0 Times in 0 Posts
    Thanks anyway.

    I hope someone will be able to help.
    I can easily see myself going nuts checking more than a 1000 data points every 10 minutes every working day...


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •