I am trying to insert about 3,000,000 records from a Text File into a SQL Server database. What is the fastest method for this operation? Also I have to check the data of each row before insert.
Printable View
I am trying to insert about 3,000,000 records from a Text File into a SQL Server database. What is the fastest method for this operation? Also I have to check the data of each row before insert.
i believe, script will be the best way to do that...
or maybe you can write some code to read from the text file (streaming) and insert into the sql server with queries...
lol., sorry, im also a noob...
You should use Bulk insert. Here's the syntax: http://msdn.microsoft.com/library/de...ba-bz_4fec.asp
TPM
The "best" way (which is highly subjective) would be like this:
Use Bulk Copy File (BCP) to import the data into a staging table, not the final location. Then once all the data is in it, then run queries on it to validate your data, set a status field on the record to indicate if it's clean or not. Then run an INSERT query to insert the data INTO your final table SELECTing FROM the staging table.
What you DO NOT WANT TO:
1) Use a cursor of ANY kind, the process will run for hours.
2) Use ADO or a loop to read the file line by line and do a .AddNew -- your process will run for hours
Believe me... I've had to build processes that had to number crunching similar to that. Using the BCP/staging table/validation/copy data format changed our process from 3 hours to less than 2 minutes. And that was with the number of records ranging from 750,000 to 1million.... I'd hate to see how long 3mil would take.
Tg
Definately use BULK INSERT...
If the data is not in a format that BULK INSERT can handle, then pre-process it with a VB program - maybe into a TAB-delimited format (that's what we prefer in our shop).
We regularly import text files with that many rows - and always use BULK INSERT.
hi,
For that sort of datasize maybe I wouldn't write my ove code!
Might be worth looking at the import/export menu path off the sqlserver client.
this will create a "dts job" (dts = data tranformation services) - (can also be scheduled in batch regulary) - there are stacks of options and i believe its very very fast. (faster that any script - even those that use bulk insert) :) (its equivalent to sqlldr util in oracle environments..)
(in our case we grab loads of data from oracle (remote) into our (local) sql server overnight - but dts jobs can handle text files as the datasource).
dts jobs can also do lookups and validation (i have not done this but there is lots of stuff on it in the help files associated witht he client)
will need to load the client tools...
hope that might be usefull, cheers AJP
Actually.... Bulk Copy Insert is faster than a DTS for one simple reason. the BCP bypasses the transaction log and writes the data directly into the tables. DTS writes everything to the transaction log, and for data of this size, that's what will hold up most of the processing. I think (but not 100% sure) that BCP will also bypass any triggers on the table being imported into.
Tg
tg is correct in that it's a not-logged insert - which is why it's fast.
It also does bypass "referential integrity" to some degree - but I would imagine that some of that can be controlled.
I think DTS would be faster than reformatting the data though....
Does that also mean that it will bypass triggers as well? We don't use triggers here (on penalty of death by paper cuts) so I'm not too clear on what would happen in that case.Quote:
Originally Posted by szlamany
Tg
Maybe, maybe not. It depends on the format of the original data. Inserting into a database takes alot of resources. Opening two files, loading the data from one, doing a replace on it, and storing it into the new file should go rather quickly. It just depends on the originating data and what needs to be done to it.
Tg
Tg - I'm not sure - as we also dislike triggers... (yuk, bury business rules in hard to find places when you have STORED PROCEDURES where it really belongs)Quote:
Originally Posted by techgnome
BTW - as far as reformatting data - we prefer to do it in VB, where we have complete control - then utilize the DATABASE ENGINE for as little as is required - BULK INSERT alone...
A simple replace would imply a standard delimiter which could be set in the parameters anyway.
I was thinking more for XML files etc...