|
-
Nov 7th, 2005, 08:03 PM
#1
Thread Starter
Addicted Member
Compare Database
Not sure if this is the right place, but I'm pretty stuck. I have 2 pretty large list ... like 1 million plus records & 10 Million plus.
One of them has a bit older data but I want to compare it to the new one and confirm the data.
The new one contains the 10 Million records.
Would importing into a both into a DB and using a VB compare string take forever?
Is there an efficient way?
-
Nov 7th, 2005, 08:09 PM
#2
Re: Compare Database
What comparison do you need? ID, Name?
-
Nov 7th, 2005, 08:31 PM
#3
Re: Compare Database
And very importantly - what database system are they currently stored in? Are they on the same server?
Using VB to compare is likely to be extremely slow in comparison to the other options which are available.
-
Nov 7th, 2005, 08:35 PM
#4
Thread Starter
Addicted Member
Re: Compare Database
I'm not using a specific database right now. I am aware that access is probably not powerfull enough for this so I was thinking mySQL.
Same Server?.... No just on my PC. Are you saying this will not be powerfull enough?
-
Nov 7th, 2005, 08:36 PM
#5
Thread Starter
Addicted Member
Re: Compare Database
Comparison is full address: #, Street, City/Town, Zip
-
Nov 7th, 2005, 08:40 PM
#6
Re: Compare Database
How is the data currently stored?
-
Nov 8th, 2005, 11:00 PM
#7
Thread Starter
Addicted Member
Re: Compare Database
Currently it is just in a csv file. I was going to import into Access or MySQL.
-
Nov 9th, 2005, 12:54 PM
#8
Re: Compare Database
Ok, I haven't used it before but from other peoples comments I would recommend importing into MySQL, and use SQL statements to do a comparison, eg:
Matching addresses:
Code:
SELECT table1.[#], table1.Street, table1.[City/Town], table1.Zip
FROM table1
INNER JOIN table2
WHERE table1.[#] = table2.[#]
AND table1.Street = table2.Street
AND table1.[City/Town] = table1.[City/Town]
AND table1.Zip = table2.Zip
Non-matching addresses:
Code:
SELECT table1.[#], table1.Street, table1.[City/Town], table1.Zip
FROM table1
WHERE Not Exists(
SELECT Street
FROM table2
WHERE table1.[#] = table2.[#]
AND table1.Street = table2.Street
AND table1.[City/Town] = table1.[City/Town]
AND table1.Zip = table2.Zip )
Note that the field names of "#" and "City/Town" probably wont be valid (or will cause issues), so I would recommend changing them.
-
Nov 9th, 2005, 01:05 PM
#9
Re: Compare Database
SQL Server has a function called BINARY_CHECKSUM. It calculates a checksum for a row, and can be used to detect changes.
I don't know if mySQL has a similar function, but if there is you could use it to create a checksum for each row in the two tables and compare the checksum value for each matching primary key in the two tables.
So instead of matching each column in the query (sloooooooooooow), you only match an integer value (muuuuuuch faster ).
-
Nov 11th, 2005, 12:27 PM
#10
Thread Starter
Addicted Member
Re: Compare Database
Ok - I think this is beyond me. Is there anyone or anywhere I can get a quote from on doing the following:
List 1: Approx. 1.3 Million Addresses
List 2: Approx. 10 - 15 Million Addresses
Note: List 1 is an outdated marketing list. List 2 is current address information.
I need to scrub List 1 against List 2 to verify that the addresses (Suite, Street No., Street Name, City, State, Zip) are still correct.
I would like the results to show which addresses have been confirmed on list 2.
Thanks for any guidance!
-
Nov 11th, 2005, 03:23 PM
#11
Fanatic Member
Re: Compare Database
This kinda points you in another direction but it sounds to me like something for which Perl was designed to handle and it would probably run in about the same amount of time that it would take to import list 2 into a db.
What bug? That's not a bug. It's an undocumented feature!
-
Nov 11th, 2005, 05:36 PM
#12
Re: Compare Database
With the additional information you have given...
Put the 10 million rows (new address file) into a SQL database. You need all kinds of indexes on key columns - like city and street.
The reason you put these into a database like SQL is so that you can find them quickly. It doesn't much matter if it's MS SQL, mySQL or probably even ACCESS (although that would certainly be my last choice).
Now the matching part - where you take the 1.3 million addresses from the "old" marketing list. There is no reason for them to ever go into SQL. You want to process them in VB or some other language - grab a row from that CSV text file and use ADO to find the matching row in the DB. You might have to make more then one hit on the DB to find a match - being more or less specific as you go through it.
I see no real reason to load the 1.2 million addresses into a DB...
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|