Compare Database

Printable View

Nov 7th, 2005, 08:03 PM
dude131

Compare Database

Not sure if this is the right place, but I'm pretty stuck. I have 2 pretty large list ... like 1 million plus records & 10 Million plus.

One of them has a bit older data but I want to compare it to the new one and confirm the data.

The new one contains the 10 Million records.

Would importing into a both into a DB and using a VB compare string take forever?

Is there an efficient way?
Nov 7th, 2005, 08:09 PM
dee-u

Re: Compare Database

What comparison do you need? ID, Name?
Nov 7th, 2005, 08:31 PM
si_the_geek

Re: Compare Database

And very importantly - what database system are they currently stored in? Are they on the same server?

Using VB to compare is likely to be extremely slow in comparison to the other options which are available.
Nov 7th, 2005, 08:35 PM
dude131

Re: Compare Database

I'm not using a specific database right now. I am aware that access is probably not powerfull enough for this so I was thinking mySQL.

Same Server?.... No just on my PC. Are you saying this will not be powerfull enough?
Nov 7th, 2005, 08:36 PM
dude131

Re: Compare Database

Comparison is full address: #, Street, City/Town, Zip
Nov 7th, 2005, 08:40 PM
si_the_geek

Re: Compare Database

How is the data currently stored?
Nov 8th, 2005, 11:00 PM
dude131

Re: Compare Database

Currently it is just in a csv file. I was going to import into Access or MySQL.
Nov 9th, 2005, 12:54 PM
si_the_geek

Re: Compare Database

Ok, I haven't used it before but from other peoples comments I would recommend importing into MySQL, and use SQL statements to do a comparison, eg:

Matching addresses:

Code:

SELECT table1.[#], table1.Street, table1.[City/Town], table1.Zip FROM table1 INNER JOIN table2 WHERE table1.[#] = table2.[#] AND table1.Street = table2.Street AND table1.[City/Town] = table1.[City/Town] AND table1.Zip = table2.Zip

Non-matching addresses:

Code:

SELECT table1.[#], table1.Street, table1.[City/Town], table1.Zip FROM table1 WHERE Not Exists( SELECT Street FROM table2 WHERE table1.[#] = table2.[#] AND table1.Street = table2.Street AND table1.[City/Town] = table1.[City/Town] AND table1.Zip = table2.Zip )

Note that the field names of "#" and "City/Town" probably wont be valid (or will cause issues), so I would recommend changing them.
Nov 9th, 2005, 01:05 PM
kaffenils

Re: Compare Database

SQL Server has a function called BINARY_CHECKSUM. It calculates a checksum for a row, and can be used to detect changes.

I don't know if mySQL has a similar function, but if there is you could use it to create a checksum for each row in the two tables and compare the checksum value for each matching primary key in the two tables.
So instead of matching each column in the query (sloooooooooooow), you only match an integer value (muuuuuuch faster :thumb:).
Nov 11th, 2005, 12:27 PM
dude131

Re: Compare Database

Ok - I think this is beyond me. Is there anyone or anywhere I can get a quote from on doing the following:

List 1: Approx. 1.3 Million Addresses
List 2: Approx. 10 - 15 Million Addresses

Note: List 1 is an outdated marketing list. List 2 is current address information.

I need to scrub List 1 against List 2 to verify that the addresses (Suite, Street No., Street Name, City, State, Zip) are still correct.

I would like the results to show which addresses have been confirmed on list 2.

Thanks for any guidance!
Nov 11th, 2005, 03:23 PM
anotherVBnewbie

Re: Compare Database

This kinda points you in another direction but it sounds to me like something for which Perl was designed to handle and it would probably run in about the same amount of time that it would take to import list 2 into a db.
Nov 11th, 2005, 05:36 PM
szlamany

Re: Compare Database

With the additional information you have given...

Put the 10 million rows (new address file) into a SQL database. You need all kinds of indexes on key columns - like city and street.

The reason you put these into a database like SQL is so that you can find them quickly. It doesn't much matter if it's MS SQL, mySQL or probably even ACCESS (although that would certainly be my last choice).

Now the matching part - where you take the 1.3 million addresses from the "old" marketing list. There is no reason for them to ever go into SQL. You want to process them in VB or some other language - grab a row from that CSV text file and use ADO to find the matching row in the DB. You might have to make more then one hit on the DB to find a match - being more or less specific as you go through it.

I see no real reason to load the 1.2 million addresses into a DB...