[RESOLVED] Need a clever way to manipulate strings
OK heres where I am at.
I have a list of string texts stored in a db table.
I am reading an Excel workbook, and trying to see if what I am reading from Excel matches any of the strings in my db table.
The problem I have is I am kind of finding a match, but the wording is slightly different.
For example ...
Spreadsheet text is "Contractors to read and sign safety instructions in branch log book"
DB text equivalent is : "Contractors to read and sign safety instructions in the front of the branch log book"
As it stands I am using the INSTR function to search the DB text with the spreadsheet text. As its not quite there due to a slight wording change the function is returning zero.
What I need to do is try and be a bit cleverer with how I manipulate the strings. I thought of maybe comparing only the first 15 or 20 characters, but the wording could be slightly different at the start, in the middle, or towards the end, so that would only get me so far.
Can anyone think of a more structured way I could tackle this ?
Maybe someone has done something similar before ?
Thanks in advance.
p.s. The INSTR function is the last thing you need when nursing the mother of all hangovers :(
Re: Need a clever way to manipulate strings
Some things:- If you constantly call the Excel object to get strings, a lot more things happen than you might think. Store the strings into a string array instead for a faster access.
- Do LCase$ for both strings in the new string array and for the strings you're about to use from database.
- You could split the DB strings by space character and then compare word by word how many of the words are found and in which order.
- InStr is actually fast in what it does. The problems in speed come from other things.
That should get you started on getting your problem solved.
Re: Need a clever way to manipulate strings
I'm actually using UCASE, so I know why you suggested LCASE.
I thought about using SPLIT, in fact, comparing word counts is about as rigid as I can get I guess. Its still not going to be perfect, but I'll see where it gets me.
I am also already storing all Excel text in a string array. I know how slow Excel is/can be !
I made a project once using .ACTIVECELL.OFFSET(x,y).SELECT statements everywhere ! Not something you would ever do twice !
(That one is now re-written).
Cheers Merri.
Re: Need a clever way to manipulate strings
ha ha ha .. just as I was going to compare Agilaz statement to Merris ... he deleted his reply !
Spoilsport !
That was going to be a practical exampe of me practicing more string comparisons :)))
Re: Need a clever way to manipulate strings
Do you want to know of a technique that allows you to access string data as integer array? That way you could create a very fast and clean character by character checking function that could work pretty much the way you wish. No need for split or any slow string handling.
The drawback is that the technique is crash prone if you hit stop or pause and if you make an error in handling array space, you get a crash as well. Basically you'd need to create the function in a separate project and test it real well before adding it into your main project. You also need a good bit of time to understand what you're actually doing. But if you're interested, I can post up some code (although I don't have VB6 on this computer).
Re: Need a clever way to manipulate strings
Well thats a bit too time consuming for what I am doing right now (I'm supposed to be finished with this today), but I am all for learning new things, and if its going to be faster I'm all for that too.
I could always amend it at a later date.
If you get the time, that would be much appreciated.
Re: Need a clever way to manipulate strings
VB Code:
' in Form1.frm
Option Explicit
Private Declare Sub CopyMemory Lib "kernel32" Alias "RtlMoveMemory" (ByRef lpvDest As Any, ByRef lpvSrc As Any, ByVal cbLen As Long)
' a regular VarPtr can't take an array variable so we need to add a support for that using this
Private Declare Function VarPtrArray Lib "msvbvm60.dll" Alias "VarPtr" (Var() As Any) As Long
' structure of a one dimensional safearray
Private Type SafeArray1D
Dimensions As Integer ' dimensions, should always be 1 in this case
FeatureFlags As Integer ' features (zero is good enough in basic use)
ElementSize As Long ' size of one element in bytes (1 = Byte, 2 = Integer or Boolean...)
LockCount As Long ' locks (no idea what this is)
DataPtr As Long ' pointer to location in memory where the data resides
Elements1D As Long ' number of elements in the first dimension
LBound1D As Long ' lower bound of the first dimension (zero is good and fast)
End Type
' declare SA variable that contains the information of our own array that we create
Dim SA As SafeArray1D
' declare an empty MyArray variable which we fake to use our own safearrayheader
Dim MyArray() As Integer
VB Code:
' Form1.frm
Private Sub Form_Load()
' initialize our own safearray
With SA
' one dimension
.Dimensions = 1
' one element is two bytes (we use Integer)
.ElementSize = 2
' number of elements: insane (biggest positive Long value)
.Elements1D = &H7FFFFFFF
' we set number of elements in this fashion se we do not need to change it constantly
End With
' set MyArray to use our own safearrayheader!
CopyMemory ByVal VarPtrArray(MyArray), VarPtr(SA), 4&
End Sub
Private Sub Form_Unload(Cancel As Integer)
' restore the original state of MyArray variable, otherwise VB will crash
' important! zero must be a Long, & character must be there or you'll crash!
CopyMemory ByVal VarPtrArray(Taulukko), 0&, 4&
' we didn't initialize MyArray before we set a custom header for it (ie. by using a ReDim)
' thus it didn't point anywhere in the memory: the pointer value was 0
End Sub
VB Code:
' Form1.frm
Private Sub Command1_Click()
Dim strTesti As String
Dim lngA As Long
' set some text to our test variable
strTesti = "BBB! Terve!"
' show situatation before any handling
MsgBox strTesti, , "Before"
' note: one character is always two bytes, that's why SA.ElementSize is two!
' now we cheat a bit: set our MyArray to point into this string!
SA.DataPtr = StrPtr(strTesti)
' lets change all B characters to A characters
For lngA = 0 To Len(strTesti) - 1
' B's character code is 66, A is 65
If MyArray(lngA) = 66 Then MyArray(lngA) = 65
Next lngA
' lets see the end result!
MsgBox strTesti, , "After"
End Sub
Translated the code from Finnish, hope I didn't miss anything :)
I had written an article with it, but I don't have the time to translate that right now.
Re: Need a clever way to manipulate strings
Brill. Thanks a lot for that. I'll have look thru it as soon as I get 5 minutes.
Thanks again :)
Re: Need a clever way to manipulate strings
Quote:
Originally Posted by TheBionicOrange
ha ha ha .. just as I was going to compare Agilaz statement to Merris ... he deleted his reply !
Spoilsport !
That was going to be a practical exampe of me practicing more string comparisons :)))
well, after hitting submit i saw that Merri already suggested the same (splitting and count matching words) :p
you can probably increase the accuracy by checking whether the matching words are in the same order.
Re: Need a clever way to manipulate strings
Thanks both for your help. Now putting this thread to bed.
Agilaz ... I would have left it. Sometimes its good that people reiterate a point. Helps when other people read it :)
Re: [RESOLVED] Need a clever way to manipulate strings
OK, i'll consider that next time :)
and sorry for reactivating and hijacking your thread but Merri's "Integer String" impressed me and while playing arount with it i had some ideas...
@Merri, first of all...great idea, hats off :bigyello:
as you already mentioned the problem is that you easily risk a crash if you do something wrong. i think the biggest problem is that you get a access violation when you try to access an element of the array that is out of your apps memory space.
VB Code:
debug.print MyArray(&H7FFFFFFF)
this is guaranteed to crash.
to avoid that you have to update Elements1D with the length of the string every time you assign a string to the array...
VB Code:
SA.DataPtr = StrPtr(strTesti)
SA.Elements1D = Len(strTesti)
this way VB can handle the error and you only get a runtime error 9 (subscription out of range) when you try to access an element that is out of the scope.
another advantage is that UBound(MyArray) will return the correct value.
you also have to avoid accessing the array after manipulating the string (strTesti) directly, because the string can be relocated which will cause the array to point to random data or to an address outside of your apps memory.
so every time you manipulate the string (not the array) you will have to repeat this immediately...
VB Code:
SA.DataPtr = StrPtr(strTesti)
SA.Elements1D = Len(strTesti)
i think the best idea whould be to wrap the hole thing up in a class to make it safe to use. that should be possible :)
Re: [RESOLVED] Need a clever way to manipulate strings
... And you sound like JUST the man for the job :)
If you'd like to give myself and Merri a shout on completion, I'm sure we could shower your post with gratitude :)
Re: [RESOLVED] Need a clever way to manipulate strings
You can use the technique safely even without "making things sure" that way; I've been using the technique speed in mind, thus setting anything that is not really required is just too much/useless. For someone getting used to the technique it might be good to set those extra things for safety.
Basically it is all up to what you prefer more: ease of use or speed. Ease of use slows things down, but makes developing a bit faster. Aiming for speed makes things a bit more complicated, but sometimes you can blow eyeballs out of your head when you figure out a cool algorithm that works hand-to-hand with the technique :)
Also, the technique can get very dangerous when you compile the program with all optimizations turned on, because then the code will never even check for valid array ranges. You can severely screw things up in memory.