Private Sub Command1_Click()
MsgBox FilterData("0012-1189-4567", 1, "12")
MsgBox FilterData("0012-1189-4567", 3, "126")
End Sub
Private Function FilterData(strDg As String, ipos As Byte, strcut As String) As String
Dim idg() As String
Dim i As Integer
Dim icnt As Integer
Dim icntcut As Integer
Dim icut As Byte
Dim ilen As Byte
Dim tmpStr As String
If Trim(strDg) <> "" Then
idg = Split(strDg, "-")
tmpStr = ""
icnt = 1
icntcut = 0
For i = 0 To UBound(idg)
If InStr(1, strcut, Mid(idg(i), ipos, 1), vbTextCompare) <> 0 And InStr(1, tmpStr, idg(i), vbTextCompare) = 0 Then
If icnt <= 9 Then
tmpStr = tmpStr & Trim(idg(i)) & "-"
Else
tmpStr = tmpStr & Trim(idg(i)) & vbCrLf
icnt = 0
End If
icnt = icnt + 1
icntcut = icntcut + 1
End If
Next
FilterData = tmpStr
Else
FilterData = ""
End If
End Function
How to modify function FilterData to very fast if strdg = "0000-0001-...-9999" (10000 sets)
You won't get much faster. EVERY program would struggle with that many iterations.. unless it were written in optimized-binary and running on NASA computers..
The best you'll get is having some sort of strange lookup table.. so you don't have to keep calling Mid(), Instr(), etc. Other than that.. thats the best you'll get for speed.
I disagree. Mid() and InStr() can be tinkered with to improve speed; they don't necessarily cause a routine to be slow.
String concatenation using the ampersand (&) operator, on the other hand, is extremely slow and can be greatly speeded up.
I could help with that if I could understand what the filtering is intended to do.
For example
1. MsgBox FilterData("0012-1189-4567", 1, "12")
That mean select data that have 1 or 2 in First position (please see red character that is a first position). Result is "1189"
2. MsgBox FilterData("0012-1189-4567", 3, "126")
That mean select data that have 1 or 2 or 6 in third position (please see blue character that is a third position). Result is "0012-4567"
Data 10000 sets not duplicates then after filter it will not duplicates too.
In that case, this will be much, much faster:
Code:
Private Function FilterData(pstrText As String, plngPosition As Long, ByVal pstrFilter As String) As String
Dim strToken() As String
Dim lngCount As Long
Dim lngLen As Long
Dim i As Long
Dim lngOldPos As Long
Dim lngNewPos As Long
If Len(pstrText) = 0 Then Exit Function
strToken = Split(LCase$(pstrText), "-")
FilterData = Space$(Len(pstrText) + (UBound(strToken) + 1) \ 9)
pstrFilter = LCase$(pstrFilter)
lngNewPos = 1
lngOldPos = 1
For i = 0 To UBound(strToken)
If Len(strToken(i)) >= plngPosition And InStr(pstrFilter, Mid$(strToken(i), plngPosition, 1)) <> 0 Then
lngCount = lngCount + 1
If lngCount < 11 Then
Mid$(FilterData, lngNewPos, 1) = "-"
lngNewPos = lngNewPos + 1
Else
Mid$(FilterData, lngNewPos, 2) = vbNewLine
lngNewPos = lngNewPos + 2
lngCount = 1
End If
Mid$(FilterData, lngNewPos, Len(strToken(i))) = Mid$(pstrText, lngOldPos, Len(strToken(i)))
lngNewPos = lngNewPos + Len(strToken(i))
End If
lngOldPos = lngOldPos + Len(strToken(i)) + 1
Next
Erase strToken
FilterData = Mid$(FilterData, 2, lngNewPos - 2)
End Function
Last edited by Ellis Dee; Sep 8th, 2007 at 01:33 AM.
The edited code above should do what you want. I ran a quick benchmark against 10,000 4-digit numbers:
Old way: 1.57 seconds (5100 chars)
New way: 0.08 seconds (5098 chars)
Note that the new way doesn't include the trailing delimiter, which is either a dash (-) or vbNewLine. For the benchmark, the final delimiter was vbNewLine, which is why the new way is two characters shorter.
Please stop PM'ing me; I get email notifications whenever the thread gets bumped.
In neither you PM or your last post have you explained what is wrong. If you post the function call (even if as an attached project) that causes the problem, I'd be happy to take a look.
Expert tip: Turn off email notifications That used to annoy the hell out of me when I had free time at work.. go on a posting spree.. 15 minutes later my inbox was filled with emails from this forum
Please stop PM'ing me; I get email notifications whenever the thread gets bumped.
In neither you PM or your last post have you explained what is wrong. If you post the function call (even if as an attached project) that causes the problem, I'd be happy to take a look.
Expert tip: Turn off email notifications That used to annoy the hell out of me when I had free time at work.. go on a posting spree.. 15 minutes later my inbox was filled with emails from this forum
Did the results shown in post 11 come from my code? If so, those linefeeds look all wrong.
I am unable to reproduce the problem; maybe attach a text file containing the source string plus the actual function call and I'll be able to figure it out.
Did the results shown in post 11 come from my code? If so, those linefeeds look all wrong.
I am unable to reproduce the problem; maybe attach a text file containing the source string plus the actual function call and I'll be able to figure it out.
Please see full data in attatchment
After that test by call function
debug.print FilterData(strFulldata,1,"1235")
Since there is no dash between 5999 and 6000, they are being read as a single token equivalent to:
"5999" & vbNewLine & "6000"
The left 1 character of this expression is 5, so it qualifies as a valid token. That's why 6000 is appearing in both versions.
The fix is simple enough, and it will correct the output formatting as well.
Code:
Private Function FilterData(ByVal pstrText As String, plngPosition As Long, ByVal pstrFilter As String) As String
Dim strToken() As String
Dim lngCount As Long
Dim lngLen As Long
Dim i As Long
Dim lngOldPos As Long
Dim lngNewPos As Long
If Len(pstrText) = 0 Then Exit Function
pstrText = Replace$(pstrText, vbNewLine, "-")
strToken = Split(LCase$(pstrText), "-")
FilterData = Space$(Len(pstrText) + (UBound(strToken) + 1) \ 9)
pstrFilter = LCase$(pstrFilter)
lngNewPos = 1
lngOldPos = 1
For i = 0 To UBound(strToken)
If Len(strToken(i)) >= plngPosition And InStr(pstrFilter, Mid$(strToken(i), plngPosition, 1)) <> 0 Then
lngCount = lngCount + 1
If lngCount < 11 Then
Mid$(FilterData, lngNewPos, 1) = "-"
lngNewPos = lngNewPos + 1
Else
Mid$(FilterData, lngNewPos, 2) = vbNewLine
lngNewPos = lngNewPos + 2
lngCount = 1
End If
Mid$(FilterData, lngNewPos, Len(strToken(i))) = Mid$(pstrText, lngOldPos, Len(strToken(i)))
lngNewPos = lngNewPos + Len(strToken(i))
End If
lngOldPos = lngOldPos + Len(strToken(i)) + 1
Next
Erase strToken
FilterData = Mid$(FilterData, 2, lngNewPos - 2)
End Function
This bug crept in because I changed the way I handled the first matching token. Originally, I identified inside the loop whether or not the token was the first match found, and if so, I didn't add a "-" prefix. Later, I decided that that was wasted cpu cycles inside the loop, so I changed it to always stuff a dash before every matching token, and then at the very end strip off the first character.
In that buffer allocation line above, you'll note that it's adding Ubound() \ 9 instead of UBound() \ 10. It could probably change to \ 10; I set it to \ 9 to stay on the safe side and ensure there was enough space to handle the linefeeds.
That means that this bug you found would only surface if the following two conditions were met:
Every token is a match
The total number of tokens <= 10
So that's bug I never would have noticed. Kudos to you on your debugging.
Last edited by Ellis Dee; Sep 8th, 2007 at 09:04 PM.