Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 6 of 6
  1. #1
    Regular Coder
    Join Date
    Jun 2004
    Posts
    128
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Efficient way of inserting thousands of rows into database?

    I'm building an application that parses chatlog files line by line and inserts them into a database. I currently have this bit of code:

    PHP Code:
            if (!empty($this->goesIntoDatabase)) {
                foreach (
    $this->goesIntoDatabase as $id => $text) {
                    
    $data explode('|'$id);
                    
    $this->DB->e("
                        INSERT INTO `data` (channelID, time, author, text)
                        VALUES (?, UNIX_TIMESTAMP(?), ?, ?)"
    ,
                        
    $channelInternalID,
                        
    $data[0],
                        
    $data[1],
                        
    $text);
                }
            } 
    It basically runs through an array that contains data, and inserts a row into the database.

    This is all fine and dandy, but in some extreme circumstances this data could be thousands of lines long. I have one example here of 9,100 lines, and it takes 6.523 seconds to run through it (which, for the amount of data, isn't too bad).

    I'm trying to refine it a bit. I was wondering if there was a more efficient way of doing bulk INSERTS rather than doing them one at a time. I've read that transactions might be very useful in this situation. I've also heard about dumping the data into a file in a certain format and then loading that via a SQL statement. I'm looking for more opinions/advise/thoughts on what can be done.

    Thanks!

  • #2
    Super Moderator
    Join Date
    Feb 2009
    Location
    England
    Posts
    539
    Thanks
    8
    Thanked 63 Times in 54 Posts
    This may not be entirely helpful, but a few years ago I had to put around 100,000 rows into a database. phpMyAdmin took around 5 minutes. Running mysql from the terminal and piping the SQL directly into it took around 5 seconds. I don't know why there's such an extreme difference in speed...

    Have you looked into INSERT DELAYED?
    http://dev.mysql.com/doc/refman/5.1/...t-delayed.html

    I doubt the data will actually be added any faster, but at least your script should finish pretty quickly and you can just leave MySQL to get on with it.
    lamped.co.uk :: Design, Development & Hosting
    marcgray.co.uk :: Technical blog

  • #3
    Regular Coder
    Join Date
    Jun 2004
    Posts
    128
    Thanks
    6
    Thanked 0 Times in 0 Posts
    Well, that kinda helped, but not enough. It dropped it down about 1-2 seconds... so instead of 6-7 secs it's about 5ish seconds

    EDIT: Also, while I'm currently using MyISAM, I plan to switch to Inno for transaction support down the road... and DELAYED doesn't work with Inno

  • #4
    Super Moderator
    Join Date
    Feb 2009
    Location
    England
    Posts
    539
    Thanks
    8
    Thanked 63 Times in 54 Posts
    Two suggestions then:

    1. If you have shell access, dump it to a file and load it with the mysql cli client. You can do this by writing SQL statements to the file. You can also use the SQL function to load a CSV-like file as documented here: http://dev.mysql.com/doc/refman/5.1/en/load-data.html - though I find a lot of hosts have the required permissions for this feature disabled by default.

    2. If you're using INSERT DELAYED, the amount of data may be an issue. You can reduce this by combining multiple VALUES into one statement like so:

    Code:
    INSERT DELAYED INTO `table` (`fields`, `fields`...) VALUES (...), (...), (...);
    You'll have to figure out how big you can make the individual INSERT statements before PHP runs out of memory or otherwise causes performance issues.

    You might also consider storing time() in a variable, and passing that directly instead of UNIX_TIMESTAMP() - it's extra parsing and system calls each time.
    Last edited by Lamped; 12-01-2010 at 11:31 PM.
    lamped.co.uk :: Design, Development & Hosting
    marcgray.co.uk :: Technical blog

  • #5
    Regular Coder
    Join Date
    Jun 2004
    Posts
    128
    Thanks
    6
    Thanked 0 Times in 0 Posts
    I'm trying to create a portable app, so using shell isn't helpful.

    I also tried combining multiple values into one statement, but I went past the max packet size or whatever. I'm sure I can increase this, but again, many people cannot, and so I'd rather not mess with it.

    Also, each insert has a different time, which is why I have UNIX_TIMESTAMP running, utherwise I would do as you suggested.

    But I think I have it under control now. I tried LOAD DATA INFILE and that seems to work like a charm! Where 9,000 rows used to take 6 seconds, it now takes, on average, 600-700ms. Very doable. I may also be able to optimize that just a tad bit more...

    =)

  • #6
    Super Moderator
    Join Date
    Feb 2009
    Location
    England
    Posts
    539
    Thanks
    8
    Thanked 63 Times in 54 Posts
    Great, that's good news.

    One note about LOAD DATA INFILE and portability: you might want to check for errors after running it, and fall back to the old slow insert. Last time I used LOAD DATA INFILE was in cPanel (arguably one of the most popular web server control panels) and I had to go playing around with the user permissions to get it working.
    lamped.co.uk :: Design, Development & Hosting
    marcgray.co.uk :: Technical blog


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •