|
-
Oct 14th, 2015, 10:03 AM
#1
Excel Advice 440K rows
I need to create a VBA script that will scan the current xls sheet (XLSA) (one column in up to for sheets) and compare it to a second xls (XLSB) to check if the ID has bee used before. Easy enough, except that the xls being compared to has 443K rows.
Just wanting advice to best tackle this. Should I read the entire 440K rows (a single column) into an array then scan the array for the value in XLSA on each row? or use Find? it needs to be very accurate in that we cannot have a duplicate ID.
Thanks!
JPnyc rocks!! (Just ask him!)
If u have your answer please go to the thread tools and click "Mark Thread Resolved"
-
Oct 14th, 2015, 11:02 AM
#2
Re: Excel Advice 440K rows
I would first sort the 443K ID column in ascending order and use a binary search algorithm to search the for the test value. This will help no matter which memory usage scheme you select by minimizing value look up and comparisons. I found this comparison of Find vs Binary search that you can review. http://stackoverflow.com/a/1893241/2592875
You can implement the search on either the sorted column or read the values into an array. If the values are strictly numeric (Integer) or comprise relatively short strings, I would tend to use an array. If they are long strings, you will have to evaluate the impact of consuming a large amount of memory versus the cost of directly accessing the Worksheet.
-
Oct 14th, 2015, 01:13 PM
#3
Re: Excel Advice 440K rows
its not big... xls file is like 4 mb so the text will be less.... I wonder If I can push the compare list out as a CSV to make it smaller before even loading.
Thanks!
JPnyc rocks!! (Just ask him!)
If u have your answer please go to the thread tools and click "Mark Thread Resolved"
-
Oct 15th, 2015, 03:56 AM
#4
Re: Excel Advice 440K rows
i am reasonable sure a collection would work faster than an array for lookups, even though you would have to process all the values into the collection first, there was a thread about this in the vb6 forum about a couple of months ago
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
-
Oct 15th, 2015, 07:20 AM
#5
Re: Excel Advice 440K rows
Static,
Do you need to confirm whether or not there are already duplicates in the list of 443k, or just that the new value (in sheet A) is not already in the big list?
-
Oct 16th, 2015, 10:29 AM
#6
Fanatic Member
Re: Excel Advice 440K rows
It sounds like you're just wanting to vlookup the values to test for duplicates. If they're all #N/A then they are unique. A lookup on 440k rows with a low hit-rate could take a long time. You could try this addin that works using a background SQLite database for indexed matching:
http://fastervlookup.codeplex.com/
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|