Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New Coder
    Join Date
    Jul 2002
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    load data from huge file into an array as fast as possible

    Hi,

    I've got certain data which I can't store in a db (it is too much data, don't ask) so I'll store it in an optimized file structure. Now I want to read the data from that file and get it into a specific array format:

    $file_array[$id_a][$id_b] = $value; with data being like:

    $file_array[1023][50123435] = 10023;
    $file_array[1023][50035768] = 00234;
    $file_array[1023][50003452] = 00037;
    $file_array[1023][50002345] = 00002;
    $file_array[566978][50000343] = 023493;
    $file_array[566978][50123435] = 004543;
    $file_array[566978][50003452] = 000039;

    this number of items in this array can get up to 2 million items!! It takes about 4 seconds to currently load the data into the array with 1.6M items from a file which is stored like:

    1023 => array ( '50123435' => '10023', '50035768' => '00234', '50003452' => '00037', '50002345' => '00002')
    566978 => array ( '50000343' => '023493', '50123435' => '004543', '50003452' => '000039')

    Is there a better way to store the data in the file from which I can fill the array more quickly (<1sec.)?

    Thanks!

  • #2
    New Coder
    Join Date
    Feb 2005
    Posts
    10
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Databases are supposed to load MUCH faster than flat files as they're designed to retrieve data directly from the disk sector unlike flat files where it has to go through certain OS restrictions. I'd suggest using databases, change the field variable types to allow you to input large amounts of data.

    Also you might want to try splitting the data into separate tables, for example Forum 'subject' and forum 'message' are broken into 2 separate tables to decrease the table size within forum tables.

    Indexing the database tables might also work, most current database systems have indexing features, which allow faster retrieval and more efficient sorting of data.

    MySQL is capable of handling very huge database sizes. PHPBB's forums are an example. They have over 10 million threads in their forums, and the site isn't loading slow at all.
    Kenetix:: Achieving more than the ordinary.
    http://www.kenetix.net

  • #3
    Regular Coder ralph l mayo's Avatar
    Join Date
    Nov 2005
    Posts
    951
    Thanks
    1
    Thanked 31 Times in 29 Posts
    Denormalized file structures can be much faster than a database when you don't care abou things like transactions and atomicity, the overhead of which dwarfs that of whatever operating systems costs the database doesn't also have to pay.

    IIRC phpBB archives threads to separate tables so only a small number of however many millions total are ever a factor in most queries.

    Per the OP, please post how you're reading the data now and what the file structure is. Ideally you'd load the data into memory once and serve it many times, meaning load time wouldn't be so much a concer as random access time, which is probably acceptable. Maybe someone can clarify how you can share memory in PHP like you can in the mod_* extensions.

    If this really needs to be fast you could write an Apache module in C to read the data and share it with PHP clients.

  • #4
    Senior Coder CFMaBiSmAd's Avatar
    Join Date
    Oct 2006
    Location
    Denver, Colorado USA
    Posts
    3,132
    Thanks
    2
    Thanked 328 Times in 320 Posts
    Since you don't indicate what this data is, how it is used, or how often it gets updated..., we can only guess, but I'll guess that you only display, access, or process a small select portion of it on any web page that you output to a browser? If so, you can save yourself a lot of loading time by using a database and only access the data you need on any particular page.
    If you are learning PHP, developing PHP code, or debugging PHP code, do yourself a favor and check your web server log for errors and/or turn on full PHP error reporting in php.ini or in a .htaccess file to get PHP to help you.

  • #5
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Using a databse with a properly structured and indexed table should do the trick.
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #6
    Senior Coder
    Join Date
    Sep 2005
    Posts
    1,791
    Thanks
    5
    Thanked 36 Times in 35 Posts
    you can put things in memory with something like memcached or with the apc caching extension, with each:
    PHP Code:
    $memcache = new Memcache;
    $memcache->connect('localhost'11211) or die ("Could not connect");


    if(
    false===($data $memcache->get('big_data'))) {
      
    $data get_lots_of_data(); //this takes a long time
      
    $memcache->set('big_data',$data);

    or with apc:
    PHP Code:
    if(false===($data apc_fetch('big_data'))) {
      
    $data get_lots_of_data(); //this takes a long time
      
    apc_store('big_data',$data);

    http://php.net/memcache
    http://php.net/apc
    My thoughts on some things: http://codemeetsmusic.com
    And my scrapbook of cool things: http://gjones.tumblr.com

  • #7
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #8
    New Coder
    Join Date
    Jul 2002
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts
    sorry for my late response, been away and the subject recently became a priority again.

    in total many gigs (>250GB) of these numbers are stored and every time (several every second) a different part of the total is requested.

    In a single request up to 2M items after each => array represented like this:

    1023 => array ( '50123435' => '10023', '50035768' => '00234', '50003452' => '00037', '50002345' => '00002')
    566978 => array ( '50000343' => '023493', '50123435' => '004543', '50003452' => '000039')

    This is just a very small sample.
    I now turn them into an array to use for calculations:

    $file_array[1023][50123435] = 10023;
    $file_array[1023][50035768] = 00234;
    $file_array[1023][50003452] = 00037;
    $file_array[1023][50002345] = 00002;
    $file_array[566978][50000343] = 023493;
    $file_array[566978][50123435] = 004543;
    $file_array[566978][50003452] = 000039;

    When finally loaded into the $file_array some calculations are done (which have the same items and than add values:

    both 1023 and 566978 have item 50123435 with total value 10023+004543
    both 1023 and 566978 habe item 3452 with total value 00037+000039

    every item is evaluated like this and only items in both (can be three, four , five etc.) lists are finally send back.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •