Results 1 to 6 of 6

Thread: Excel Advice 440K rows

  1. #1

    Thread Starter
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Question Excel Advice 440K rows

    I need to create a VBA script that will scan the current xls sheet (XLSA) (one column in up to for sheets) and compare it to a second xls (XLSB) to check if the ID has bee used before. Easy enough, except that the xls being compared to has 443K rows.

    Just wanting advice to best tackle this. Should I read the entire 440K rows (a single column) into an array then scan the array for the value in XLSA on each row? or use Find? it needs to be very accurate in that we cannot have a duplicate ID.

    Thanks!
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  2. #2
    PowerPoster
    Join Date
    Oct 2010
    Posts
    2,141

    Re: Excel Advice 440K rows

    I would first sort the 443K ID column in ascending order and use a binary search algorithm to search the for the test value. This will help no matter which memory usage scheme you select by minimizing value look up and comparisons. I found this comparison of Find vs Binary search that you can review. http://stackoverflow.com/a/1893241/2592875

    You can implement the search on either the sorted column or read the values into an array. If the values are strictly numeric (Integer) or comprise relatively short strings, I would tend to use an array. If they are long strings, you will have to evaluate the impact of consuming a large amount of memory versus the cost of directly accessing the Worksheet.

  3. #3

    Thread Starter
    PowerPoster Static's Avatar
    Join Date
    Oct 2000
    Location
    Rochester, NY
    Posts
    9,390

    Re: Excel Advice 440K rows

    its not big... xls file is like 4 mb so the text will be less.... I wonder If I can push the compare list out as a CSV to make it smaller before even loading.

    Thanks!
    JPnyc rocks!! (Just ask him!)
    If u have your answer please go to the thread tools and click "Mark Thread Resolved"

  4. #4
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Excel Advice 440K rows

    i am reasonable sure a collection would work faster than an array for lookups, even though you would have to process all the values into the collection first, there was a thread about this in the vb6 forum about a couple of months ago
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  5. #5
    PowerPoster
    Join Date
    Oct 2008
    Location
    Midwest Region, United States
    Posts
    3,574

    Re: Excel Advice 440K rows

    Static,

    Do you need to confirm whether or not there are already duplicates in the list of 443k, or just that the new value (in sheet A) is not already in the big list?

  6. #6
    Fanatic Member dmaruca's Avatar
    Join Date
    May 2006
    Location
    Jacksonville, FL
    Posts
    577

    Re: Excel Advice 440K rows

    It sounds like you're just wanting to vlookup the values to test for duplicates. If they're all #N/A then they are unique. A lookup on 440k rows with a low hit-rate could take a long time. You could try this addin that works using a background SQLite database for indexed matching:

    http://fastervlookup.codeplex.com/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width