[2005] ETL efficiency

**MaslowB** · Jan 8th, 2009, 09:06 AM

I'm doing some limited ETL via Visual Studio since we have been unable to get much ease of use out of SSIS for Teradata connections into and out of user workspaces on Teradata depending on who runs the packages.

I'm mostly looking for some coding efficiency pointers on doing data flow between systems. Currently I've designed a class that will run a select on one system, pull that into a datatable. Next, it dynamically builds an insert statement based on the number, and name of the columns from the incoming datatable, and the target table name supplied. It has special handling for null values and date datatypes.

My next guess as to efficiency improvements is to allow a 'batch size' to have it build say X number of inserts at once before submitting them to the database all at once instead of a single insert submitted at a time.

Also I was going to try having it dynamically build the insert statement for a tableadapter and seeing if I could figure out how to submit the entire datatable to a tableadapter and see how well it pumped the data down. A concern I have with this approach is that in the ETL's where I have to push data to Teradata, different date columns on different tables may be formatted differently, which I could probably compensate for with some complicated SQL on the select side.

Thread: [2005] ETL efficiency

Thread Tools

Display

[2005] ETL efficiency

Posting Permissions