If your data comes from a database, then something like this would be much easier to do on the SQL side...
Removing items in an array is never easy. If you convert the array into a collection, then it will be much easier. Is that an option for you ? or does it HAVE to be an array ?
By the way, how many items in the array ? thousands, millions ?
The brute force ways are relatively simple--loop over A and check if each element is in B. If not, add it to the end of another data structure--ReDim Preserve would work, for instance. This is hideously inefficient, but for smallish arrays it just doesn't matter.
More efficient methods could use a hash function. Hash B into its own boolean array HashB. Iterate through A, hashing each A(a). Check if HashB(Hash(A(a))) is true; if so, and you didn't have any collisions while making HashB, mark a down in a separate boolean array the same length as A. If you did have collisions, first test if A(a) = B(b) for each b that went into Hash(A(a))--you'd need to keep a list somehow. Finally, after you've finished marking which ones to delete, create a whole new array by iterating through your boolean marks. You should keep a count of how many deleted items there are so you can figure out the length of the final array in one operation.
The time you enjoy wasting is not wasted time. Bertrand Russell
The brute force ways are relatively simple--loop over A and check if each element is in B. If not, add it to the end of another data structure--ReDim Preserve would work, for instance. This is hideously inefficient, but for smallish arrays it just doesn't matter.
More efficient methods could use a hash function. Hash B into its own boolean array HashB. Iterate through A, hashing each A(a). Check if HashB(Hash(A(a))) is true; if so, and you didn't have any collisions while making HashB, mark a down in a separate boolean array the same length as A. If you did have collisions, first test if A(a) = B(b) for each b that went into Hash(A(a))--you'd need to keep a list somehow. Finally, after you've finished marking which ones to delete, create a whole new array by iterating through your boolean marks. You should keep a count of how many deleted items there are so you can figure out the length of the final array in one operation.
Sounds like both your methods are pretty much the same, actually your seocnd method has more unneeded work.
But as you said for a resonable array size, for say arrays "a" and "b", loop through "a" and "b" at same time and check for any NON duplicates send these non(or not non) duplicates to a new array(tmpArr), then if you like erase the old array and add these temp array elements dups back to the old array.
If it was a big array and going to eat up your system i would write it in C++ (but i always say that ) you can call the c++ from VB if you like, still much faster.
More info would be nice, but just assuming you have fixed lengh string arrays and you want to remove from array a all items present in array b you could use a Func like this:
Code:
Private Function FilterArray(a() As String, b() As String) As String()
Dim i As Long
For i = LBound(b) To UBound(b)
a = Filter(a, b(i), False)
Next
End Function
Working example:
Code:
Dim a() As String, b() As String, i As Long
ReDim a(6) 'Will have 7 items (a,b,c,d,e,f,g)
ReDim b(3) 'Will have 4 items (a,b,c,d)
'Fill Arrays..
For i = 0 To 6
a(i) = Chr(97 + i)
Next
For i = 0 To 3
b(i) = Chr(97 + i)
Next
'Call the Func
FilterArray a(), b()
'Show Results: (e,f,g)
For i = LBound(a) To UBound(a)
Debug.Print a(i)
Next
EDIT: Now i remember this will also exclude partial matches, it will work just if those arrays contain fixed lengh strings, else: if A contains an item "DOG" and B contains an item "DO" then item A "contains" item B, and "DOG" will be removed from array A, and i don't think you want this to happen.
coolcurrent4u has only given us a poor description of what he wants. The datatype is really quite important as are the quantities (A & B).
The array elements are strings, and am processing millions of it. Specifically, am using all words from english dictionary as arrayA, and arrayB is words i specify
arrayA = this, is, one, array, with, strings, elements
arrayB = this, is, one
Remove all elements of arrayB from arrayA, so arrayC should be
arrayC =array, with, strings, elemenet
Programming is all about good logic. Spend more time here
thanks to all that contributed, i have finally come up with a solution.
please see the attached. if you can make it faster or optimize it, please let me know
Last edited by coolcurrent4u; Feb 4th, 2011 at 01:48 PM.
Reason: typo
Programming is all about good logic. Spend more time here
I think the easiest way to implement this would be with a database.
If you wanted to use some sort of custom system then here are my thoughts.
The big list of words is fixed so there a few things you can do to optimise it for searching. You can order it alphabetically and by word length, i.e. all the 6 letter words are grouped together in alphabetical order. You can then use a binary search to locate (or not) the word in the respective subset. This has the potential to locate (or not) any word by only comparing a few strings (A subset of 1'000'000 would require a max of 20 comparisons). You could also break down each subset further by indexing by say first letter but it would not be a great saving as binary searching is so effective on it's own.
Finally the array you build with the results does not have to be a string array, it could be a array of long that stores the indexes from the original big list.
memory wise...
OED2 defines over 600'000 words many of these are very uncommon so you could decide just to use a subset but if you go with the big list... If the average word length was 7 and you stored each one as a separate string it would take something like 14Mb ram (similar to a 24bpp bitmap 2200x2200). You could instead use ANSI strings and rather than have each word as a separate string you could divide the words up by their length and store each group in a single string, reducing the footprint to 4Mb or so. You would then use MidB$ to perform the binary search.
Last edited by Milk; Feb 4th, 2011 at 11:35 AM.
Reason: grammer
The line "ReDim Preserve strTemp(1 To k)" from post #13 is not a good idea with arrays your size because VB uses very stupid array resizing techniques with the Preserve keyword. You should instead try
Code:
ReDim strTemp(1 To iSourceUpper - iSourceLower + 1)
For i = iSearchLower To iSearchUpper
For j = iSourceLower To iSourceUpper
If InStrB(SourceArray(j), SearchArray(i)) <> 0 Then
k = k + 1
strTemp(k) = SourceArray(j)
DoEvents
End If
Next j
Next i
FilterArray = Join(strTemp, " ")
FilterArray = Trim$(FilterArray)
Originally Posted by Jmacp
Sounds like both your methods are pretty much the same, actually your seocnd method has more unneeded work.
They're not even remotely similar. The first method I mentioned does incremental memory reallocations while the second does a single one. The first reads each array object in its entirety to test for equality while the second only requires a hashing function which can read a small fraction of the object if well crafted. The second is wildly more efficient, asymptotically, though more complicated; the first is naive. I'm not sure how you could think they're similar, and it's a little insulting that you said the second does unnecessary work apparently without understanding it.
But as it turns out the OP's "array" is not an array at all, but a space-separated string of words. My above methods would need modification to work in this case, though the "Keep = True" case of post #13 is basically my first method.
The time you enjoy wasting is not wasted time. Bertrand Russell
Re: [RESOLVED] filter an array using another array vb6
I don't know what Jem is going to say but I suggest either work with a large string or work with a string array, don't split the large string for every search. A binary search would be much much quicker, although a well implemented hash function/hash table system has potential to be quicker still.
Re: [RESOLVED] filter an array using another array vb6
Your method certainly isn't the most efficient I can think of, but if you're happy with the time it takes, then great. The Keep=False case is particularly slow since you're Splitting potentially huge amounts of text and Joining them repeatedly which requires lots of data copying. The Keep=True case is probably quite quick since your first Filter will remove all elements but one, excepting duplicates. For instance, 'FilterArray("cow dog donkey", "dog")' returns "dog" and 'FilterArray("cow dog donkey", "dog donkey")' returns "". This is probably not the behavior you actually wanted, since FilterArray becomes a sort of boolean InStr, so I assume you've only been using Keep=False.
Are you unhappy with your current speed? You could avoid repeated Split and Joins by operating on an intermediate array instead. That should be faster.
Edit: In the Keep=True case, I agree with Milk--keeping a sorted string array of Source and applying binary search is far, far more efficient. In the Keep=False case, this doesn't work well.
Last edited by jemidiah; Feb 4th, 2011 at 02:27 PM.
The time you enjoy wasting is not wasted time. Bertrand Russell
Re: [RESOLVED] filter an array using another array vb6
i found a class of a hash table, but i can't or haven't found a binary search implementation in vb yet. Besides, there are many search algorithms out there like radix, balanced tree, binary tree search, interpolation etc, some of which is better that binary searching.
Before you suggested binary, i believe you have an implementation or have been using binary search.
Can you help me with some implementation?
How can i apply a hash table to my problem?
Programming is all about good logic. Spend more time here
Re: [RESOLVED] filter an array using another array vb6
Originally Posted by jemidiah
Are you unhappy with your current speed? You could avoid repeated Split and Joins by operating on an intermediate array instead. That should be faster.
can you throw more light to this or give sample code?
Programming is all about good logic. Spend more time here
Re: [RESOLVED] filter an array using another array vb6
Originally Posted by coolcurrent4u
i found a class of a hash table, but i can't or haven't found a binary search implementation in vb yet. Besides, there are many search algorithms out there like radix, balanced tree, binary tree search, interpolation etc, some of which is better that binary searching.
Before you suggested binary, i believe you have an implementation or have been using binary search.
Can you help me with some implementation?
How can i apply a hash table to my problem?
I would forget about hash functions / hash tables, it's more advanced than a binary search which you appear to be struggling with, and it won't be particularly easy to implement effectively in VB6
I have already linked to a thread with a couple of VB6 binary search implementations you could just copy a paste, as well as a Wikipedia article which discusses the algorithm in some detail. I can't really be more explicit.
Re: [RESOLVED] filter an array using another array vb6
ok now lets assume i want to use a binary search and filter elements
i have to sort the arrayA and arrayB
do the search removing every element of arrayB from arrayA
i definitely need a temporary array to copy to or do some sort of redim preserve
that been said, i think this will take more time and memory, and going to implement it anyway and see the difference. Am concerned about large arrayA and or large arrayB. Any more suggestions will be appreciated
Programming is all about good logic. Spend more time here
Re: [RESOLVED] filter an array using another array vb6
No you just need to sort array A, this just needs to be done once during the lifetime of the program. You would pass the binary search function, array A and a single string from array B. You would do this for every string in Array B. You actually do not have to redim preserve anything if you use the binary search function to populate a boolean (or bitset byte) array from which you build your final array, C.
Re: [RESOLVED] filter an array using another array vb6
Originally Posted by coolcurrent4u
can you throw more light to this or give sample code?
Sure:
Code:
Function FilterArray(ByVal Source As String, ByVal Search As String, Optional _
ByVal Keep As Boolean = True) As String
Dim i As Long
Dim SearchArray() As String
Dim IntermediateArray() As String
Dim iSearchLower As Long
Dim iSearchUpper As Long
If LenB(Source) <> 0 And LenB(Search) <> 0 Then
SearchArray = Split(Search, " ")
Else
FilterArray = Source
Exit Function
End If
iSearchLower = LBound(SearchArray)
iSearchUpper = UBound(SearchArray)
IntermediateArray = Split(Source, " ")
For i = iSearchLower To iSearchUpper
DoEvents
IntermediateArray = Filter(IntermediateArray, SearchArray(i), Keep, _
vbTextCompare)
Next i
FilterArray = Join(IntermediateArray, " ")
End Function
This should be far more efficient than what you had, but it still copies huge chunks of Source repeatedly, which is inefficient when Source is very large.
I agree with Milk that my hash table idea is too advanced (at least in VB6), unless this is incredibly important. The various other search methods are probably also more advanced than you're looking for, consider VB6's almost complete lack of a general purpose class library. I can only imagine the horror that would be a Red/Black tree implementation. (Off topic: for those interested, here's an amazing implementation of Red/Black trees in Haskell, a functional language. It's so brief and cute, it's remarkable. Also, the linked paper was written by a professor of mine's student or friend, I can't quite remember which.)
To elaborate slightly on Milk's solution, you would do something like this pseudovbcode:
Code:
'Assumes A is a string array instead of a single string
' A is sorted and probably globally available, for fast binary search.
'Similarly B is a string array.
Dim NotInB(LBound(A) To UBound(A)) As Boolean
For b = LBound(B) to UBound(B)
a = BinarySearch(ByRef A, B(b)) 'Returns index at which B(b) occurs in A, or LBound(A)-1 if not found
If a >= LBound(A) Then
NotInB(a) = True
NumFound = NumFound + 1
End If
Next b
ReDim NewA(LBound(A) To UBound(A) - NumFound) As String
For a = LBound(A) to UBound(A)
If NotInB(a) Then
NumSkipped = NumSkipped + 1
End If
NewA(a - NumSkipped) = A(a)
Next a
'return join(NewA, " ")
The time you enjoy wasting is not wasted time. Bertrand Russell