-
Compare Database
Not sure if this is the right place, but I'm pretty stuck. I have 2 pretty large list ... like 1 million plus records & 10 Million plus.
One of them has a bit older data but I want to compare it to the new one and confirm the data.
The new one contains the 10 Million records.
Would importing into a both into a DB and using a VB compare string take forever?
Is there an efficient way?
-
Re: Compare Database
What comparison do you need? ID, Name?
-
Re: Compare Database
And very importantly - what database system are they currently stored in? Are they on the same server?
Using VB to compare is likely to be extremely slow in comparison to the other options which are available.
-
Re: Compare Database
I'm not using a specific database right now. I am aware that access is probably not powerfull enough for this so I was thinking mySQL.
Same Server?.... No just on my PC. Are you saying this will not be powerfull enough?
-
Re: Compare Database
Comparison is full address: #, Street, City/Town, Zip
-
Re: Compare Database
How is the data currently stored?
-
Re: Compare Database
Currently it is just in a csv file. I was going to import into Access or MySQL.
-
Re: Compare Database
Ok, I haven't used it before but from other peoples comments I would recommend importing into MySQL, and use SQL statements to do a comparison, eg:
Matching addresses:
Code:
SELECT table1.[#], table1.Street, table1.[City/Town], table1.Zip
FROM table1
INNER JOIN table2
WHERE table1.[#] = table2.[#]
AND table1.Street = table2.Street
AND table1.[City/Town] = table1.[City/Town]
AND table1.Zip = table2.Zip
Non-matching addresses:
Code:
SELECT table1.[#], table1.Street, table1.[City/Town], table1.Zip
FROM table1
WHERE Not Exists(
SELECT Street
FROM table2
WHERE table1.[#] = table2.[#]
AND table1.Street = table2.Street
AND table1.[City/Town] = table1.[City/Town]
AND table1.Zip = table2.Zip )
Note that the field names of "#" and "City/Town" probably wont be valid (or will cause issues), so I would recommend changing them.
-
Re: Compare Database
SQL Server has a function called BINARY_CHECKSUM. It calculates a checksum for a row, and can be used to detect changes.
I don't know if mySQL has a similar function, but if there is you could use it to create a checksum for each row in the two tables and compare the checksum value for each matching primary key in the two tables.
So instead of matching each column in the query (sloooooooooooow), you only match an integer value (muuuuuuch faster :thumb:).
-
Re: Compare Database
Ok - I think this is beyond me. Is there anyone or anywhere I can get a quote from on doing the following:
List 1: Approx. 1.3 Million Addresses
List 2: Approx. 10 - 15 Million Addresses
Note: List 1 is an outdated marketing list. List 2 is current address information.
I need to scrub List 1 against List 2 to verify that the addresses (Suite, Street No., Street Name, City, State, Zip) are still correct.
I would like the results to show which addresses have been confirmed on list 2.
Thanks for any guidance!
-
Re: Compare Database
This kinda points you in another direction but it sounds to me like something for which Perl was designed to handle and it would probably run in about the same amount of time that it would take to import list 2 into a db.
-
Re: Compare Database
With the additional information you have given...
Put the 10 million rows (new address file) into a SQL database. You need all kinds of indexes on key columns - like city and street.
The reason you put these into a database like SQL is so that you can find them quickly. It doesn't much matter if it's MS SQL, mySQL or probably even ACCESS (although that would certainly be my last choice).
Now the matching part - where you take the 1.3 million addresses from the "old" marketing list. There is no reason for them to ever go into SQL. You want to process them in VB or some other language - grab a row from that CSV text file and use ADO to find the matching row in the DB. You might have to make more then one hit on the DB to find a match - being more or less specific as you go through it.
I see no real reason to load the 1.2 million addresses into a DB...