Results 1 to 10 of 10

Thread: Best way to insert 7 million rows to a table

  1. #1

    Thread Starter
    PowerPoster make me rain's Avatar
    Join Date
    Sep 2008
    Location
    india/Hubli
    Posts
    2,208

    Best way to insert 7 million rows to a table

    With MySQL 5.7+
    how to insert 7 million rows to a table, the table is having 2 after update & On after delete triggers which alters the data in some other table & which is unavoidable for me.

    the query (.Sql file execution) is running from past 56 Hours but still not even completed 15%

    What is the best way out please.
    Is it possible to by pass the trigger & try


    (I have tried deleting the Triggers, but recreating the triggers took still more time, i believe Because it takes for each row check may be)

    Any advise please
    The averted nuclear war
    My notes:

    PrOtect your PC. MSDN Functions .OOP LINUX forum
    .LINQ LINQ videous
    If some one helps you please rate them with out fail , forum doesn't expects any thing other than this

  2. #2
    A SQL Server fool GaryMazzone's Avatar
    Join Date
    Aug 2005
    Location
    Dover,NH
    Posts
    7,495

    Re: Best way to insert 7 million rows to a table

    Personally i would run this into a new table (mimic the structure of the real table with out indexes or triggers). Once the data is loaded the strat the next part moving the data to the actual table
    Sometimes the Programmer
    Sometimes the DBA

    Mazz1

  3. #3
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,532

    Re: Best way to insert 7 million rows to a table

    While I agree with Gary's assessment of using a second table, I'm going to take it a step further with this explanation.

    A large part of my career in the past has been the import/export of data in/out of systems. What I have found to work best is this:
    1) Create a table that as close as possible resembles the data you're trying to import. All fields are text and nullable. This is the staging table.
    2) BULK COPY the data into the staging table. Just get it all in there.
    3) Run a process that then does a data scrub. Check for invalid values, bad dates, anything that won't convert properly or will cause the system to choke if it hits the live tables. This includes the NULL values. Records that pass get marked.
    4) Once the data is scrubbed, and is clean, copy the clean records (that's why we mark them) and insert them into the final production table.
    5) Remove the cleaned records from the staging table.

    Step 2 can happen very quickly even with 7M records. It will slow down with step 4 especially if you have triggers. Although there is probably a way to bulk insert from the staging table to the final table skipping the logging and triggers. The question is: do you want the triggers to happen on the new data, or is it irrelevant? If you do want them, then copy the data over in batches. Depending on what the triggers do, you may have to go in batches as small as 1000, or if they're mostly clean, you can do 10,000 at a time, or maybe more... play around with the size until you get an acceptable performance level.

    Option B: Use SSIS (I'm assuming you're using SQL Server)... and create a package that pulls in the file, scrubs the data, tosses out bad records, and inserts the new data into tables. This is the kind of stuff it's designed for. And if I remember right, you cn turn off the triggers through configuration in the pipeline when inserting into a table.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  4. #4
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,900

    Re: Best way to insert 7 million rows to a table

    To be honest, those triggers are probably the root of your problem. As long as they're there the database can really only process one row at a time so "fast" approaches like BCP are out. Inegrity constraints like foreign keys and programmed constraints also slow things down. Finlly, so do any forms of index. So, if we can, we want to turn all of that off for the duration of the insert and then turn it back on again.

    However, I wouldn't just drop the triggers and run everything in. Those triggers are, presumably, there for a reason and by dropping them you could be creating all sorts of inconsistencies in the data.

    I agree with TG and GM's suggestion to BCP into a staging table first. That's going to give you all sorts of benefits such as the ability to cleanse the data as TG describes. But I think the stage after that should be to investigate and understand exactly what those triggers are doing.

    Ideally you should develop a package that allows you to turn off the triggers and emulate their effect directly. E.g. if a trigger is writing to an audit table, write your own insert to that audit trial. By taking this approach you can temporarily take the database off line, script and drop all triggers, foreign keys, indexes etc and your inserts should the be much quicker. You then use those scripts to recreat everything you dropped and bring the database back on line.

    BUT this approach requires a great deal of rigour no your part. You need to make sure that you've developed a package that puts all the right data in all the right tables and correctly maintain integrity between those tables. You absolutely need to make sure you script everything you drop so you're sure you can recreate it correctly. I shouldn't need to say this to anyone who works with databases but, just in case: TAKE A BACKUP FIRST!!

    AND this requires that you are able to take the database off line while you run the process. If you're system needs to be continuously available then there really isn't much of an option except to leave all the triggers etc in place and then, no matter what mechanisms you use and run the data in in chunks as TG described. In that case you need to make sure you create a process that runs whole chunks in and can roll a whole chunk back out if anything fails. As long as you write that robustly you can leave it running in the background for days.

    I wrote a process to redact all aged data from an insurance web sites backend (actually multiple sites masquerading as different companies, because that's the insurance industry) for GDPR. When it was first switched on it had to work it's way through the whole backlog of records that had build up over years - pretty much since the internet was first a thing. It took over a month to clear that back log. Lesson: if you work in manageable chunks and are rigorous enough in your process design you can script something to run continuously in the background to process massive amounts of data. It'll get there eventually.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  5. #5

    Thread Starter
    PowerPoster make me rain's Avatar
    Join Date
    Sep 2008
    Location
    india/Hubli
    Posts
    2,208

    Re: Best way to insert 7 million rows to a table

    Thanks to all
    it worked out as
    (1) The Triggers were deleted
    (2) Then i have uploaded the data
    Of course it took about 10 Hrs, but it worked out to be great

    But i have another series problem, that is the indexes are not working, the simple where class is taking too much time.
    How to solve this please
    The averted nuclear war
    My notes:

    PrOtect your PC. MSDN Functions .OOP LINUX forum
    .LINQ LINQ videous
    If some one helps you please rate them with out fail , forum doesn't expects any thing other than this

  6. #6
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,532

    Re: Best way to insert 7 million rows to a table

    (2) Then i have uploaded the data
    Of course it took about 10 Hrs, but it worked out to be great
    That's good you got it loaded, bad that it still took that long. It shouldn't have. But you didn't show us how you did it, so we can't offer anything in regards to help with that.

    But i have another series problem, that is the indexes are not working, the simple where class is taking too much time.
    How to solve this please
    Saying something isn't working with out any further details is akin to walking into the doctor's office and saying "it hurts" with no other details. It doesn't work that way. What about your indexes aren't working? Have you run the DBCC reindexing command? How do you know they aren't working? What have you done to verify this?
    Also, this is now a new problem which really should get its own thread. With its own appropriate title too.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  7. #7
    A SQL Server fool GaryMazzone's Avatar
    Join Date
    Aug 2005
    Location
    Dover,NH
    Posts
    7,495

    Re: Best way to insert 7 million rows to a table

    @tg he is running MySQL not SQL Server. I also do not understand what indexes are not working means. If the rows are in the table it should be picked up by the index. If not then are you even sure that the data went into the table? What about the Primary Key Index is that right? Did you think about dropping and recreating one of the indexes and see if that is the issue?
    Sometimes the Programmer
    Sometimes the DBA

    Mazz1

  8. #8
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,532

    Re: Best way to insert 7 million rows to a table

    Quote Originally Posted by GaryMazzone View Post
    @tg he is running MySQL not SQL Server.
    Ahhh.... sorry, I somehow missed that in the initial post. My bad... then that might explain why it took so long. Still feel like that's too long though.
    Until we know more about what's "not right" with the indexes, it's hard to suggest where to go.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  9. #9
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,900

    Re: Best way to insert 7 million rows to a table

    Given that he turned triggers off and then ran a load of data in, I'm willing to bet he dropped foreign keys and now they won't go back on again. Those triggers were there for a reason. But yeah, we need more information.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  10. #10
    Wall Poster TysonLPrice's Avatar
    Join Date
    Sep 2002
    Location
    Columbus, Ohio
    Posts
    3,834

    Re: Best way to insert 7 million rows to a table

    Quote Originally Posted by make me rain View Post
    Thanks to all
    it worked out as
    (1) The Triggers were deleted
    (2) Then i have uploaded the data
    Of course it took about 10 Hrs, but it worked out to be great

    But i have another series problem, that is the indexes are not working, the simple where class is taking too much time.
    How to solve this please
    I don't know about MySQL but in Microsoft SQL Server you can just disable the trigger and then enable it.
    Please remember next time...elections matter!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width