Results 1 to 19 of 19

Thread: Mostly Merging Multiple Tables

  1. #1

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,988

    Mostly Merging Multiple Tables

    I looked around for an easier way to do this and didn't find anything that quite suited. Perhaps there isn't anything, but I thought I'd ask here, first.

    The situation I have is this:

    1) I have some number of tables (a dozen, or so) in a datatable.
    2) Each table is different from one another, but each matches a table in a database exactly (same number of fields, same names on the fields, same names on the tables, same datatypes, same constraints).
    3) The records in the datatable may or may not match records in the matching DB table, so for every record in the datatable, there will be either an INSERT or an UPDATE.
    4) I don't have to worry about concurrency issues. There won't be a case where two processes will ever be working on the same record in the database.
    5) This has to be in code. A stored procedure would be a significant problem (or at least I'd prefer to avoid crossing that bridge at this time).

    So, the brute force approach would be to write an UPDATE statement that updated every field in the DB table (other than the primary key field) with the matching field in a datarow from the datatable, where the PK are the same. If that returns 0, then do an INSERT.

    The brute force approach would require that the UPDATE and INSERT methods be generated somewhat automatically, since there are so many tables.

    One option that I can see would be to create a local datatable, and use a dataadapter to populate it from the database table with all records that matched the PKs in the incoming datatable. In other words, SELECT all records from the DB table for which the PK of the record matches one of the PKs in the incoming datatable. That would give me a local datatable, I could then go through each datarow in the incoming datatable and either update the local, or add a new record if the row was new. Then UPDATE the local datatable back to the DB.

    This approach has an issue, since that initial SELECT would be a bit of a chore, and this would essentially be copying the data from an existing datatable into a new one for the sole purpose of toggling the Status of the datarow appropriately. Therefore, it seems like there ought to be a better way.

    One thing I have encountered is Merge. Some people argue that it shouldn't be used because of the potential for race conditions and deadlocks, but those all seem to occur as a result of concurrency issues, and I'm pretty sure I can ignore those, in this case. However, there IS some slim chance that I'd re-use the code in such a way that concurrency might, conceivably, matter. Furthermore, I have only seen MERGE examples in stored procedures, though some look like they could be used elsewhere.

    So, is there a better way for my circumstances?
    My usual boring signature: Nothing

  2. #2

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,988

    Re: Mostly Merging Multiple Tables

    From the lack or responses, I'm beginning to think that this isn't something that has a standard solution.
    My usual boring signature: Nothing

  3. #3
    PowerPoster
    Join Date
    Sep 2005
    Location
    Modesto, Ca.
    Posts
    5,195

    Re: Mostly Merging Multiple Tables

    I'll respond, just so you don't feel ignored.

    Is there any user interaction with this data? Or is this an import and then update DB. If there's not user interaction my first question would be why put multiple imports in one datatable and then process. I could see doing it if you want to display all the import data to the user.

  4. #4

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,988

    Re: Mostly Merging Multiple Tables

    What is happening is an export of selected records from one DB into a file. The file contains a zipped, XML-serialized dataset containing the records. So, on the import side of things, I'm taking that file, unzipping it, then deserializing back into a dataset. That dataset will then be upserted into the receiving DB. Since a person might do this more than once for a variety of reasons (including whim, accident, forgetfulness, and also because the source data might have changed, necessitating a re-export of the data), the records may or may not exist in the destination DB. Thus, I can't just insert them.

    One fortunate piece is that the PKs on all the tables are GUIDs, so recognizing an existing record on the destination is relatively trivial.
    My usual boring signature: Nothing

  5. #5
    PowerPoster
    Join Date
    Sep 2005
    Location
    Modesto, Ca.
    Posts
    5,195

    Re: Mostly Merging Multiple Tables

    Don't get your hopes up I'm not responding because I have a solution.

    Your first post made me think you were importing data from various export sources and putting them all in one datatable. Your last post sounds like your importing data from one source but maybe the same data(or edited data) multiple times. If it's from one source, if you import it multiple times, wouldn't editing the existing datatable make life easier when your ready to update the database.

    I'm probably misunderstanding because I don't think any of these concepts are new to you.

  6. #6
    King of sapila
    Join Date
    Oct 2006
    Location
    Greece
    Posts
    6,597

    Re: Mostly Merging Multiple Tables

    Let me chip in (where are they usual dbiers ? ).
    I admit we mostly insert anything in the db's as we want to have all the data and then distinguish.
    If what is needed is only select data that is not similar with no concurrency issue then how about except?
    https://docs.microsoft.com/en-us/sql...l-server-ver15
    Then what is kept will bi inserted.
    Again admittedly I have worked this only a little because as I've said we are data hungry over here due to the fact that the initial data is not in our control so we must first get EVERYTHING and then see what we can do.
    AndofcoursesecondlyI'mnotansqlexpert.
    ἄνδρα μοι ἔννεπε, μοῦσα, πολύτροπον, ὃς μάλα πολλὰ
    πλάγχθη, ἐπεὶ Τροίης ἱερὸν πτολίεθρον ἔπερσεν·

  7. #7
    Smooth Moperator techgnome's Avatar
    Join Date
    May 2002
    Posts
    34,532

    Re: Mostly Merging Multiple Tables

    Why can't you just shove the data into a staging table and then upserted into the final table? That's what I'd do with it. I never just import data. Ever. I don't care where it comes from, don't trust it. Don't trust, will verify. Usually I'm dealing with data from unknown sources, or at least in formats that are incompatible with the receiving system, but still, anytime data is coming in, trust or not, always verify it. And if that means stuffing it into a staging table, so be it

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  8. #8
    PowerPoster
    Join Date
    Sep 2005
    Location
    Modesto, Ca.
    Posts
    5,195

    Re: Mostly Merging Multiple Tables

    1) I have some number of tables (a dozen, or so) in a datatable.
    Yeah, I'm having a hard time understanding the reason for having dozens of tables in one datatable if there is no user interaction. I know I'm being more curious than helpful.

  9. #9
    PowerPoster Zvoni's Avatar
    Join Date
    Sep 2012
    Location
    To the moon and then left
    Posts
    4,418

    Re: Mostly Merging Multiple Tables

    Why would you think, Merge only works in a SP?

    https://www.hackdeploy.com/how-to-ru...t-using-merge/
    Last edited by Zvoni; Tomorrow at 31:69 PM.
    ----------------------------------------------------------------------------------------

    One System to rule them all, One Code to find them,
    One IDE to bring them all, and to the Framework bind them,
    in the Land of Redmond, where the Windows lie
    ---------------------------------------------------------------------------------
    People call me crazy because i'm jumping out of perfectly fine airplanes.
    ---------------------------------------------------------------------------------
    Code is like a joke: If you have to explain it, it's bad

  10. #10
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,900

    Re: Mostly Merging Multiple Tables

    Do you mean you actually have the multiple data tables in a c#/vb dataset? If so you should just be able to bind that back to the actual database tables using a dataadapter, specify the primary keys and insert/update statements and call update. An upsert is the default behaviour when everything's wired up correctly.

    (I feel like I've probably miss-understood the issue here)
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  11. #11

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,988

    Re: Mostly Merging Multiple Tables

    Quote Originally Posted by Zvoni View Post
    Why would you think, Merge only works in a SP?

    https://www.hackdeploy.com/how-to-ru...t-using-merge/
    I didn't mean to say that, to be fair. I realized that I'd be able to do a merge, I just also found that there were people who were opposed to merge for a variety of reasons. I don't believe those reasons apply to me, but I couldn't be certain, so I was leaning away from merge.
    My usual boring signature: Nothing

  12. #12
    A SQL Server fool GaryMazzone's Avatar
    Join Date
    Aug 2005
    Location
    Dover,NH
    Posts
    7,495

    Re: Mostly Merging Multiple Tables

    My complaint on merge statement is purely performance related... It is horrible when ever I used it
    Sometimes the Programmer
    Sometimes the DBA

    Mazz1

  13. #13
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,900

    Re: Mostly Merging Multiple Tables

    I just also found that there were people who were opposed to merge for a variety of reasons
    It's an aside to your main query but the main risk is that the individual inserts and updates are run as individual statements, not a single atomic statement. You're safe doing it in a off-line DB (as long as your merge source doesn't collide with itself in which case I'm pretty sure you get an error anyway) and in a live DB you can simply wrap it in a transaction (as long as you're not worried about lock blocking).
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  14. #14

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,988

    Re: Mostly Merging Multiple Tables

    Some interesting responses. I think I should explain things a bit more generally.

    What is happening is that I have a program that is storing some information in a database. What is being stored is the information necessary to create a remarkably complicated model (it resembles a neural net, or a state machine, but it REALLY is not). The information is spread across a series of tables in the database, which is understandable considering the complexity of the model. My expectation is that somebody will export the information for the model and email it to somebody somewhere else in the country, and that person will import the information into their database. So, what I am doing is building a dataset because the model is spread across a series of tables, and because a dataset can so readily be saved to an XML file, which is easily emailed. The XML file is zipped to make it more compact and easier to email, but that's also easy. So, the process is: database->dataset -> string->text file->string->dataset->other database.

    @Funky: I was thinking that the dataadapter.Update wouldn't be possible because the dataadapter updates tables based on the RowState field (or is it RowStatus) of the datarows. In this case, the RowState for the incoming datatables will be wrong. They will either all be Unchanged, though I could probably set them all to something else. What I can't easily do is say whether they are Added, Modified, or Unchanged. I believe that means that the dataadapter will try to do the wrong thing. If they are all Added, it would try to call INSERT, which is often appropriate. If they are Unchanged, then nothing will happen, which is rarely going to be appropriate. And I probably can't get them all to be Modified, but then the dataadapter would call the UPDATE command, which is occasionally going to be right and otherwise will be wrong.

    @TG: I did find that approach. I like it in general, but was kind of avoiding it, and I might be wrong to do so. It seems like using a staging table and upsert would only work in an ad hoc approach or a stored procedure. Ad hoc isn't going to work, but a stored procedure might.

    However, I should add a further bit of explanation: The process that creates this file to be emailed and the process that accepts the file are optional plugins to a larger program. People can choose to add this functionality, or not. The plugin that received the file could write a stored procedure (and could either leave it behind or clean it up afterwards) to the database, I've just been trying to avoid doing so. They have the means to do that, and I know how to do that, but I'd rather NOT do that if I can avoid it because these are plugins, and I'd like them to touch the DB as lightly as they can, though this isn't sensitive data and the databases will tend to be physically isolated, so I don't have any particular concern about having a plugin do something to the DB.

    Also, I do have considerable trust in this data. The file will be checked to see that it has the right set of tables with the right set of fields. I'm not then checking that the fields contain the right type of data, but only because I feel it would be excessive. This is fish data. It's not particularly sensitive...just fish need to scale. It would be technically possible for a person to construct a file that had the right set of tables with the right set of fields, but contained some kind of malicious data...however, that would be a whole lot of work to do virtually no damage to anything at all, so it's very unlikely.

    So, is a staging table still an option? I could run a series of commands: Create Table (to create the staging table), push all the records to the table, upsert from the staging table to the actual table, then drop the staging table. Possible, I'd say, but is there some advantage to doing it that way?
    Last edited by Shaggy Hiker; Jan 19th, 2022 at 11:10 AM.
    My usual boring signature: Nothing

  15. #15

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,988

    Re: Mostly Merging Multiple Tables

    Quote Originally Posted by FunkyDexter View Post
    It's an aside to your main query but the main risk is that the individual inserts and updates are run as individual statements, not a single atomic statement. You're safe doing it in a off-line DB (as long as your merge source doesn't collide with itself in which case I'm pretty sure you get an error anyway) and in a live DB you can simply wrap it in a transaction (as long as you're not worried about lock blocking).
    That was my understanding. I expect that the DB will only ever have a single user at any given time. It would be possible for there to be a few simultaneous users, but that would be...really quite marvelous. The DB would be at fish hatcheries, and no hatchery has more than a few people around...and they don't tend to spend a whole lot of time working with data. If they got a file to import, that import could be reasonably expected to have the DB to itself for the few seconds (or more likely milliseconds) that the import took to complete.
    My usual boring signature: Nothing

  16. #16
    PowerPoster
    Join Date
    Sep 2005
    Location
    Modesto, Ca.
    Posts
    5,195

    Re: Mostly Merging Multiple Tables

    Instead of constantly checking the local db to find if the datarow is a ADD. My GUESS is it would be faster to create a dataset with all the records from the necessary tables from the local DB an use the datatable FIND method to check if the PK exists. Then you can use the dataadapter Update method.

    You mentioned in your first post of creating datatables from the local db with the records that match the PK's in the import dataset. But why not just take all the records for those tables from the local db?
    Last edited by wes4dbt; Jan 19th, 2022 at 02:32 PM.

  17. #17

    Thread Starter
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    38,988

    Re: Mostly Merging Multiple Tables

    Cause there are a whole lot of them.
    My usual boring signature: Nothing

  18. #18
    PowerPoster
    Join Date
    Sep 2005
    Location
    Modesto, Ca.
    Posts
    5,195

    Re: Mostly Merging Multiple Tables

    Yeah, I got no idea what that "whole lot" number of records would be to where retrieving only records that are in the import or constantly querying the database would be faster. If faster is even a factor.

  19. #19
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,900

    Re: Mostly Merging Multiple Tables

    the RowState for the incoming datatables will be wrong
    Ah, I think you're right on that. My bad.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width